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MENTAL TESTS AND LINGUISTIC ABILITY 

STEPHENS. COLVIN 
Brown Univorsity 
and 

EICHAUD D. ALLEN 

Director of Resenrek and Guidance, Public Schools, Providence 

When Yerkes in 1916 published his Point Scaled Bevision of the 
Binot Tests, he included among his findings data relating to the 
social and racial status of children tested. Ho concludes *'that con¬ 
ditions which are in part clcscribable as sociological arc correlated 
with differences in intoUGCtual performance, which may amount to 
os much as 30 per cent of the total.” He says: "In view of this 
fact, which our rosultfl amply demonstrate, it is obviously unfair 
to judge by the same norm of intelligonco two cliiUlron, the one of 
whom comes from an excellent homo and neighborhood and the other 
from a medium to poor lioine and neighborhood. 

This caution, so definitely stated by Yorkes, has by no means 
always been followed and the result is that conclusions, in many 
oases apparently unwarranted, or at least unproven, have been drawn 
in regard to the levels of intelligence of various .social, economic 
and industrial groups. 

Even Yerkes himself has accepted the results of the Army Tests 
os definitely showing the fact that there arc clear grades of intelligence 
among various occupational groups. He places nt the top the pro¬ 
fessional group and at the bottom tho unskilled laborer.^ While it 
is probable that this classification has some justification, it is evident 


‘Yorkoa, Robert M., Bridges, James W. and Hardwick, Rose E.: "A Point 
Scale for MonBuring Mental Ability." Warwick and York. 

’ Op. eft, p. 82. 

’ “Army Mental Testa." Now York, 1020, pp. 1C7-200. 

I 
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: workers in various occupations represent individuals varying 

in economio and social status. 

eminent authorities seem to hold to the opinion that environ- 
is of but slight importance in determining the results of mental 
jiSting—Tcrman* believes "that the environment of the home affects 
pile results but little."^ Even "limited acquaintance with the lan- 
’ guage employed in the examination docs not put the subject to great 
disadvantage in many of tho testa." However, if it is true, as it 
seems reasonable to conclude, that mental tests arc to be relied 
upon only when those who arc tested have had the same or at least simi¬ 
lar opportunities to become acquainted with tlie materials of the tests 
and the same interests in learning about these materials, then these 


environmental factors are of clear significance. In linguistic tests in 
particular it is of large moment that those tested have a substantially 
similar familarity with the words employed and have a similar skill 
in the use of these words. 

The dependence of linguistic knowledge and skill on tho ability 
to secure average or high scores in intelligence tests of the verbal 
type has been pointed out by several investigators. Whipple, for 
example, in a summary of his results secured at tho University of 
Michigan says* that 94 per cent of tho students tested I'cccived in tho 
Army Alpha grades of B or better, and adds; "of the remaining 6 
per cent, several were students of foreign extraction whose low score 
must have been in a considerable measure produced by a lack of ready 
command of English.” 

Burt/ who has voooutly given the Binet-Simon scale to London 
school children and has worked over his findings by careful statistical 
methods, concludes that various factoi's affect tho result. "Sex 


influences it but little; social status rather more; educational and 
particularly linguistic attainments more profoundly than any other 
factor measurable with exactitude." 


In a paper read lost February before the National Society for 
the Study of Education in its meeting at Chicago, Rugg gave an inter¬ 
esting summary of the correlation eoefficiente obtained between intelli- 


^ Tho IntoUigencc of School Children." Houghton, Mifflin, Boston, 1910, p. 14, 
. ^Op. cit., p. 12. 

® Twenty-first Yoarhooh of the National Sooioty for tho Study of Education, 
p. 268. Public School Publishing Co,., Bbomington, III. 

*Burt, Cyril; "Montnl ond Scholastic Teats.” Publiahcd by tho London 
County Council, 1921, p. 20S. 
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gence tests and school attainment. He finds in his review of various 
investigations that the coefficient ax© only moderate botwcon scores 
in intelligence tests and achievement in ^‘most non-verbal” educational 
tests. They are higher in tho case of 'tomowhat verbal” educational 
tests and they arc conspicuously high in the case of the "most verbal 
educational tests.” Tho correlations between scores in verbal intel¬ 
ligence tests and vocabulary and reading attainments are particularly 
high. 

Sometimes, no doubt, fiucncy in the vse of the vernacular results 
in an individual receiving a higher 8001*0 in an intcUigence test of the 
linguistic type than his real mentality warrants. Downey is of tho 
opinion that "undoubtedly the most important source of error in 
judging intclHgenco is tho undue emphasis laid upon verbal fluency 
08 a measuvo of social and general intelligence.” This probably is a 
significant factor in the selection of college students for the Greek 
Latter fraternities. A freshman who can talk makes a decided impres¬ 
sion on hia fellow studenta, and in tho huvry of the "rushing aeaaon” 
this first impression lias an undue influence. Recently at Brown it has 
been found tliat the mombevs of the freshman class pledged to fraterni¬ 
ties received the following distribution in intelligence scores—best 
two-fifths, 31 per cent; lowest two-fifths, 56 per cent. Among tho 
unpledged men, excluding those caeca whei*e because of various affilia¬ 
tions, etc. tho don would not be "rushed” the distribution was—best 
two-fifths, 61 per centj lowe.at two-fifths, 31 per cent. This is a strik¬ 
ing demonstration of tlie intellectual superiority of tho unpledged men. 
It is fair to assume that the pledged students in tho entering class 
made an impression because of their verbal fluency quite out of pro¬ 
portion to their real ability. Apparently, too, this verbal fluency 
was but "skin deep.” Clearly it did not serve the men who possessed 
it substantia ly wlien they wore subjected to the test of the psycho¬ 
logical examination. Goddard, like Downey, believes that high 
grades of mental deficiency are liable to go undetected when accom¬ 
panied by verbal fiucncy. 

However, while verbal ability may raise intelligence scores in 
some instances above tho level of the actual intelligence of tho person 
examined, its most marked effect is noted under tho conditions when 
the lack of such a facility unduly lowcns such scores. 

Tho attention of one of the writers (C.) was first definitely called 
to this fact when in tho winter of 1910-1920, the Otis Group Intelli¬ 
gence Settle, Forms A and B, was given under his direction by the 
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ite^^^liers in tlie elementfU'y schools of Brookline, Moasachusetts, 

: iib 1877 children in Grades IV to IX inclusive.^ 

The scores obtained ■were considerably above the Otis norms, par¬ 
ticularly in the so-called “better diatriots" in Brookline. Otis 
Testa gi'ven at about the same time by "Warren W. Coxo to the public 
school children of Cincinnati, Ohio, obtained quite different results. 
In this latter instance, the norms were below those published by 
Otis, The tests in both cities seem to have been given with clue 
oare and the disparity of results can hardly bo due to differences 
of method or caution in administration. On the other hand, it cannot 
be assumed that a child in Brookline of 12 years of age has a mental 
age on the average 2 years in advance of a Cincinnati child of the 
same chronological age. The pronounced differences in scores 
between the Brookline children and the Cincinnati children may 
reasonably be attributed to differences in opportunities to learn words 
and acquire skill is their use. This conclusion is supported by the 
fact that the children in the poorer localities in Brookline did not 
score as high in the entire test as did the children in the more favored 
localities. However, in the avithmetie test (largely non-verbal in 
its nature) their scores were not inferior to those made by the children 
in the “better" localities. 

Another piece of evidence in support of the above viewpoint has 
been found in examining the records of Brown University students in 
the light of their psychological tests. As a rule, men who score 
Tow in their psychological tests moke a poor record in college. For 
example} out of a total of 95 men in lost year’s junior and senior 
classes who received low grades in their combined psychological testa, 
70 have done distinctly poor work, 17 fair work, and only 8 good work. 
On investigating more carefully the records of these 8 men, it was found 
that at least 2 suffered from language handicaps, while the others were 
either slow thinkers, were indifferent or suffered emotional upsets. 
Further investigations have revealed at least 10 men in Brown Uni¬ 
versity who have received psychological scores decidedly lower than 
their real mental ability because of language defioienoios. By defici” 
enciea is not necessarily meant inability to speak English with cor¬ 
rect pronunciation and reasonable fluency. The most conspicuous 
examples of language deficiencies have been found among students of 
Italian parentage. These studente were born in America and ha've 
received their education in the public schools. They, however, have 

^Journal Eduealional Research, Vol. Ill, No. 1, Jan., 1921. 
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lived in the Italian eection and at home frequently converse in Italian 
and think in Italian. They have a limited English vocabulary and 
tend to think slowly in English. 

Recently the writer (A.)^ Director of Research and Guidance in tire 
public schools of Providence, has made an investigation of 60 children 
of American parentage and 60 of Italian parentage in the publio schools 
of the city. The school status of these two groups of children is prac¬ 
tically identical. They are taken from Grades V to VIII, inclusive; 
the ages of the American born children range from 11 to 16, those of the 
Italian born from 11 to 15; the average chronological ago of the 
American born is 13.12 years, of the Italian 13.11 years; the average 
pedagogical age of the American children is 11.77 years, of the Italian 
12. The school records of the Italian group indicate them to be in 
every way the equal in intelligence of the American group. When 
measured by the Stanford-Binefc scale, the average IQ’s as well as 
the distribution of tho IQ’s is only slightly in favor of the American 
group. However, when tested by the National Intelligence Tests, 
the average total score of tho American children is 103; of tho Italian 
children 90. In one tost alone are the Italian children on the average 
equal to the American children and that is in tho arithmetic test. 
Further, in tlie American group the agreement between the Binet 
scores and tho National rcoixjr, is much closer than in the Italian 
group, the Italian children scoring decidedly lower in the National 
Tests than would bo expected in the light of their school standing and 
of their IQ’s derived from the Stanford-Binet. The explanation of 
these contradictions seems obvious. The Italian cliildrcn arc suffering 
from a language handicap; hence their intelligence as determined by 
their scores in the verbal group Tests, is rated decidedly too low. 
These facts are expressed in tabular form in Table 1. 

Diagi'am I shows in detail the relation in the two groups between 
the Terman and tlie National aebrea expimsed in mental ages. It 
will bo observed tlmt while the children in both groups tend to be 
rated lower in mental ago by the National than by tho Terman Tests, 
tho diderenco is considerably greater in tlio Italian than in the 
American group. It i.s rcaHonablo to suppose that tho individual 
Terman lesis are a more accurate doLonnination o£ intelligence than 
the Group National Tc.sts; further that the linguistic factor is loss 
important in tlio former than in tho latter tc.sts. It would seem evi¬ 
dent, then, that tho National Tests do not give an accurate moans of 
determining tho mental ages of the Italian pupils and that tho verbal 
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factor has confcributad in lowering the individual scores. It is to 
be noted further from tlic diagram that the Terman and the National 
Testa for both groups are more closely in agreement at the higher 
mental ages tliau at tho lower. The most striking divergence begins 
at about tlic mental ago of 10, while from il to 15, including 19 
pupils in the Italian group and 20 in the American group, the differ¬ 
ences are unimportant. 

Another compariao]i shows the greater agreement in tho American 
group than in tho Italian group between the mental ages derived from 
these two teats, lly consulting Table II we find listed for the Amori- 
can group tho mental ages of each pupil according to the National 
scores (N) and according to the Terman scores (T); also the differ¬ 
ence, plus and minus, between the N mental ages and the T mental 
ages under column D. For the highest 25 in mental ages tho total 
difference is 17,6, 10 plus and 7.5 minus, a very substantial agree¬ 
ment. For tlic lowest 25 tho deviation totals 38, all plus. 

In Tabic III we have a similar comparison for the Italian group. 
Hero we find for the higher 26 in mental age a total difference of 21, 
with a plus difforciicQ of 14. In tho lowest 25 wo find no minus 
differoncoR and a total plus difference of 63. Hero again is a clear 
evidonoo that there is a much greater difference in tho mental ages 
derived from tho two tests in tho Italian group tlian in tho American 
group and also that in tho Italian group the mental ages derived from 
the Terman scores arc significantly higher than those ages derived from 
the National scores. 

A detailed examination of Tabic IV, which shows tho relation 
between the gross scores in the National Tests and grade placoinont 
for both groups, reveals the following facts: 15 Italian children 
received acoro.s of 100 or above. All of these arc found in the two 
upper grades. Twenty-three American children received simi¬ 
lar scores, and 6 of this number are in Grade VI. Thirty-five Italian 
children received scores below 100 and of these 19 were in the two 
upper grades, while of the American children only 5 were in tho 
two upper grade.s. In the two lower gradcvS arc found 16 Italian 
children and 22 American children. Thus judged by school attain¬ 
ment the Italian children are clearly le.Hs aocuratoly classified in intelli¬ 
gence by the National Tests than are tho Amorican children. The 
objection may bo raised that the school attainment of tho two groups 
may not bo n valid indication of their rcspeclive abilities, since the 
Americans and tho Italians may be judged by different standards. 
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however, not likely to be 111© case since the American and the 
children were in the same schools and in the same vooma and in 
i'i|hbut equal proportions. 

'• 3 / Another significant pieeo of evidence indicating that the Italian 
.y children in the Providence public schooLs snfTcv in their testa from 
a linguistic handicap is shown from comparative scores of a group of 
American and of Italian children in the Grade V/1. One hundred and 


seventy-three American children and 163 Italian children wore given 
the Lippincott-Chapman Classroom Products Survey Tests. Those 
tests are composed of two sub-testa in Arithmetic (fundamentals and 
problems) and two reading sub-testa (selections and continuous 
passages). These tests, {dthough designed to measure school achieve¬ 
ment, are in reality for all essential purposes intcUigenco tests, since 
they are made up of elcmenta commonly found in intelligence tests. 
Tables V to IX inclusive indicate the distribution of Bcorca made by 


individual pupils. 

In these tables the figures at the left under “score'’ give the 
standards of achievement for the various levels of the various grades 


according to tho published norms of Chapman. In the present article 
the highest levels, 9 and 9-1-; have been combined under “nine,” 
and the lowest levels, 1 and 1—, have been combined under “one." 


A study of these tables shows that the two groups are nearly equal 
in ability in fundamentals in arithmetic. Hero language plays no 
definite part. In arithmetio problems the language handicap 
begins to be felt, as shown by the greater number of Americans 
above the’grade median. In both of the reading tests most of the 
Americans are above the grade median, while most of the Italians 


are below. Chart IX shows how the language handicap affects the 
total score made by the Italians. 

Still another bit of corroborative evidence as to tho fact that 


linguistic ability plays an important r61e in tests of the verbal type 
is furnished by results obtained from time to time through tests 
given in the Mary G. Wheeler school for girls. This school is a 
high, grade private school located on the East Side in the city of 
Providence and draws largely its childrmr from families of superior 
social and economic status. The Otis Tests given in this school in 
the academio year 1919-20 “show, from the fourth to the seventh 
grades inclusive, that considerably mote than half of the pupils 
tested fall in the classes near genius, very superior, or superior, while 
only a small percentage are below the average."* Later tests made 


^ Joiamal of Sdiicational Seaearch, &p, ciL 
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during the academic year 1921-22 are in agreement with these earlier 
findings. Tests given in Grades VI and VII to 29 children show all 
to be above the Otis norms. Thirty-six children given the National 
Tests in Grades V, VI and VII have with a single exception IQ’s 
at 100 or above, 27 being above. The median IQ is 116, the highest 
160 and the lowest 96. The school achievement does not indicate 
that those children arc of marked fluperior ability. Althougli 60 per 
cent of the children are under age for their grades, thoir toaohers are 
of the opinion tliat as a group they are not exceptionally bright and 
consider that in most instances their intelligence scores give them 
too high a mental rating. 

The above results would seem to indicate that the opinion of Phil¬ 
lips/ although possibly an oven-sfcatement, has a large element of truth. 
He says, "I have demonstrated and can demonstrate that, of all the 
intelligence tests yet published, 75 per cent of the questions depend 
more on oxperionco, on associations and on the general and specific 
education of the individual than on native intelligence. Of the 41 
children found in California having ia superior IQ all save one belonged 
to families of oulturc and intelligence, and of which one or both parents 
were college graduates. Association in onviromnents, not native 
intelligence, qualified them to answer.” 

The writer (C.) is moved to add in comment the following: All 
intelligence tests yet published depend for their validity on experience, 
on assooiation and on the general and specific education of the indi¬ 
vidual. However (and here must be added the qualifying statement to 
prevent those who read theso opinions from making a wrong inference 
as to the worth of these tests), such testa, depending as they do 
absolutely on the experiences of those tested, are valid in showing 
differences in native mentality when, and only when, those tested have 
had common experiences and similar interests. 

Prom the evidence now in our possession we can reasonably con¬ 
clude that linguistic ability is an important factor in the score obtained 
by an individual in an intelligence test that is based largely on words 
and their uses. This ability must definitely be considered whenever 
an individual in a group tested deviates in any marked degree in this 
ability from hia fellow.a in tho group. How to discover this deviation 
in^inguistic ability is, however, in moat instances a difficult task. Of 
course when tho individual examined is foreign born, and when ho 
speaks English in a halting and inaccurate way, he can be easily singled 

^American Educalion, Vol. XXVI, No. 1022, p. 61. 
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attention. It haa been pointed out, however, that an 
may possess a pronounced linguistic handicap and still 
vernacular with a reasonable accuracy and fluency. It is 
bo remembered that the possession of unu.sual linguistic fluency 
an individual a decided advantage in an intelligence test and 
'^i^itit in the securing by him of a score that places his apparent mon- 
4^ity considerably above what it really is. Such cases as these clearly 
' cannot be detected by ordinary observation. Yarious ways suggest 
themselves as means by which a linguistic handicap or unusual verbal 
fluency may be discovered. 

1. In testing a group of children or adults to determine their 
mentality, a preliminary test to discover their reading ability might 
be employed. For instance the various tests devised by Thorndike to 
measure .the extent of vocabulary and the understanding of paragraphs 
read silently, suggest themselves. However, such tests would help iis 
in our own particular problem little, if at all, and for this quite apparent 
reason—tho differences in reading ability discovered might bo due 
to one of two causes, or to both combined. The difforonccs in tho 
scores obtained by various individuals might result from actual differ¬ 
ences in intelUgencOf or from differences in opportunities to acquire 
familiarity with and fluency in the vernacular. Without further evi¬ 
dence there would be no way of determining the presence of these two 
factors or their relative amounts. 

2. A. preliminary survey might be made of the social and economic 
status of the individuals tested. When this status is decidedly below 
that of the average of the group, there would be presumptive evidence 
of a significant language handicap. In particular, children and 
adults living with foreign born and largely non-English speaking 

’ groups should be singled out for further investigation. On the other 
hand those coming from particularly favored environment ^Yhel■e the 
social status is high and the cultural influcncea superior should also be 
given special consideration, since their intelligence scores are likely to 
indicate a mentality in excess of that which they actually possess. 

3. In testing agroupit would be avcU to employ a number of mental 
examinations.. Oui’ belief w that for cUUdren in the upper grades, 
youth in the high school, and adults in our collogos and universities, 
the procedure should bo somewhat as follows: There should bo two 
testa of the linguistic type given on succeeding days, or weeks under 
conditions as to time, place, and methods of administration as nearly 
identical as possible. These tests may be either two forms of the same 
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mentfll examination, or two different examinations, preferably the 
latter. In any event each of these examinations should bo preceded by 
a simple fore^examination to acquaint all with the nature of the tests 
to be employed. Later, a third mental examination of tho performance 
type should bo given. When there is substantial agreement for a given 
individual in the results of these separate examinations then the 
scores may be taken os expressing reasonably well the actual mentality 
of that individual. When there is a marked disagreement this means 
that such an individual will require further investigation, such as 
interviews and testing by individual examinations of tho Binet and 
performance types. In this connection it should bo pointed out that 
we need better developed and more completely standardized tests of 
the group performance typo, adapted to the intelligence of older 
children and adults. The majority of tho performance tests so far 
devised have been prepared for children in the primary grades, illit¬ 
erates and non-English speaking adults. These are not difficult 
enough, as a rule, to discriminate between the higher levels of men¬ 
tality. One lost, however, tho Myers’ Mental Measure,^ is a group 
examination entirely of tho non-linguistic variety, designed for all 
grades from tho kinclorgarten to the university. Dearborn has like¬ 
wise extended his Non-linguistic Tests to cover all grades through 
the high school. The writer (C.) has not employed this test in his 
own work, but published accounts scorn to indicate that it has been 
reasonably successful in measuring native intelligence.* 

4. When intolligcnco tests ni-c employed primarily for the purpo.se 
of promotion and elimination, particularly for tho latter purpose, no 
pupil or student who scores conspicuously low should be dealt with 
entirely on the basis of such mental examinations. These pupils and 
students should be further investigated. Their previous academic 
records should he .studied, the opinions of their teachers obtained. 


* ^'wenly-first Ycarhoolc of IfaUoiinl Sooioty for the Study of Education, Chapter 
IV. Publio School Publiehing Co. 

^ Perhaps a noto hero may not bo out of placo in regard to tlio relation between 
non-verbal tests ami lingiiislio ability. It docs not scorn to follow noccssnrily when 
a tost is employed in which no words aro used that for this ronBou linguistio ability 
is not brought into play, particularly when rational proccssos are involved. Tho 
writer (C.) conetantly catches himsolf wlioii working witli non-verbal tests using 
inner speech wliich becomes more and more conscious whon a problem increases 
in difficulty and complexity. Tina aceins to mean tliat wo cannot think to any 
extent without the use of words, bonco wo can never liopo to frame intclligonoo 
tests that do not depend to a certain degree upon linguistic knowledge and ability. 
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their social and economio status investigated, etc. There should be 
at least one personal interview, and a further mental examination of 
the individual type given. 

From the results that have been obtained, particularly with 
students at Brown University, based on investigations now extending 
over a period of more than 4 years, the writer (C.) is convinced that 
the group examination of the verbal type when given with care will 
reveal in general the actual mentality of those tested to a degree of 
accuracy that is suthoient for all practical purposes in from 80 to 90 
per cent of the cases tested. 

That it fails at times to indicate accurately the real mentality of 
the individual or a group of individuals does not mean that it should be 
abandoned, but it does mean that in cases vitally affecting the school 
career of the pupil utmost caution should be employed. 



Tabld 

I.-SuMMAUr 




AMBniOAN 

Itawah 

1. 

Number of pupils. 

. fiO 

60 

2. 

Grades. 

. 6-8 

6-8 

3. 

Ages. 

. 11-16 

11-16 

4. 

Average chronological age. 

. 18.12 

13.11 

5. 

Average pedagogical age. 

. 11.77 

12. 

6. 

Average score (National). 

. 103 

go 

7, 

Average IQ. 




Terman. 

. 92 

91 


National. 

. 86 

76 

8. 

Average soore-National Test. 




Part 1. 

25 



Part 2. 

20 

17 


Part S. 

21 

17 


Part 4.... 

10 

10 


Part 5. 

27 

22 



103 (Americaa 

90 (Italian) 

g. 

Correlation between two tests.., 


. 79 (Italian) 
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TADIiH 31.—AMBniOANS 


T 

D 

Pupil 


15 

- 0.6 

20 

1 

16 

0 

27 

1 

12 

- 2 

2R 

1 

12 

- 2 

20 

1 

13.5 

0 

30 

1 

14 

.6 

31 

1 

14 

.6 

32 

1 

12 

— 1 

33 

1 

12 

- 1 

34 

1 

14 

1 

36 

1 

12 

- 1 

30 

1 

12.6 

0 

37 

1 

12 

0 

3S 

1 

13 

1 

80 

1 

12 

0 

40 

1 

13 

1 

41 

1 

12 

0 

42 


13 

1 

43 


13 

2 

4-4 


11 

0 

46 


11 

0 

46 


11 

0 

47 


12 

1 

48 


12 

1 

40 


12 


60 



iHr-iaoooooooooooooooooooosooooo 
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Table IV.—Tadi-r Siiowinq tub Rblawoh dbtwbbm Scjiiool Grade and 

National Scour 


50 Itiilians 
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TADI/BV.—CoMP/niaONOpAcCOMPtWHMIINTOF 173 AMEHIOANSANDIOS iTAMANfl 
IN'TBB VA GnADB, AB MeabOOEB BY inB LiPnNCO'rt-CllAPWAN GoABSnOOM 
PnoDgcTfl Survey Tests 

ArHhmDtic Fundanicntal Tost 


Score 

Distribution of scores of 173 Aincricnns 
(Eaoh star indientos as individual score) 

1 TotftW 

Higheati—0 


16 

8 


0 

7 


21 

0 

if •««*«*»* 

46 

Median—5 

ififiiaiiififiitiiifriifiiiiiiiiiiiiiiiiiiiiii* «»****« 

52 

4 

iiiiittiiii******'******'* 

24 

3 

iiiiifiii 

0 

2 

****** 

0 

lowest—1 


0 


Gi'aud total..... 

178 




Score 

Disinbution of scores of 1C3 It&linns 
(Eaoh star indiontos an individual scoro) 

Totals 

Kigheat—0 


11 

8 


0 

7 


22 

0 


33 

Median—S 

iiiiiiiififiiiiitiifiiiifiiififfffifif« 

4Q 

4 1 

iitifiiiiiifiiiiitiiiiiiiiiiii 

29 

3 

ifiiiiiiiiiiii 

14 

2 

+99*9 1 

6 

Lowest — 1 

+ 9* 1 

3 


Grand total. 

1G3 
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'PadIiE VI.—CoMi'ATiisoK OP AccoMpi.wiiMEHT OF 173 Ambhioans and 163 
InUANS IN tub GnADEy AS MBAflUBED DY TUB LlPPlNCOTT-GlUPMAM 
CLA88U00M PitODUCTS SUHVEY TebTS 

ArithmoUo Probloma Tost 


Score 

Distribution olscorosol 173 Amoiicniis 
(Kacli star indicates an individual score) 

Totals 

Highest—9 

*tr»«*«*4l«««l|i********«4'»«*(»«A***4l*« 

34 

S 


1 23 

7 

*«««•«** **44 *«****«****«**«!» 

34 

0 


34 

Median—6 

****it«4*4*4'i|>44'l>44***«**«4*******«44« 

36 

4 


10 

3 


0 

2 

«* 

2 

Lowest—1 


Q 


Grand lotsl... 

173 




Scoro 

DiBtribuUon oC scores of 163 Italians 
(Kach star indicates an individual score) 

Totals 

Highest—0 

•*««««*» 

8 

8 


10 

7 


22 

6 

4 4* 4 44* **«««**# «#**** 

48 

Median—6 

***4*«******4********««**«***«***4«*i|>*>|i44* 

41 

4 

«*«*«*«*«******«**4444 

22 

3 

**44*4** 

8 

2 

4*4 

3 

Lowest —1 

4 

Grand total. 

1 

103 
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Table VII.— CoMPAnwoir op AocoMPMsnuBNr op 173 AwaniCANa and 163 
iTAMANa IN THE Vi GnADH, AB MbABDUBP BY THE LlPPlNCO’JT-CllAPMAN 
CLAS8B00M PftODDCTO SUBVEY TeSTO 

ZloAding Solaotions Test 


Scota 

DlatribuUon of ficoroa of 173 Americans 
(Eaoli star iadieatoB an individual score) 

1 

TotoU 

1 

Hjgheflfc—9 


31 

8 


14 

7 


2-1 

6 

.|■l^l)l**«««««f««y*«*«•*«f*»«*«**«***«*’*«**•«** 

43 

Median—6 

**«*«f**«*4<«*««««**««««««*******'***««***«*»*«* 

46 

4 

iftt******* 

10 

3 

*** 

8 

2 

V 1 

1 

Lowest —1 

* 

1 



173 




1 

Score 

Distribution of scores of 103 ItaliaDs 
(Each star indicates an individual^acorc) 

1 

! Totals 

Highest—9 

* 

* 1 

8 

* 

1 

7 

1 *** 

3 

6 


10 

Median—6 


33 

4 ' 


39 

3 


30 

2 


28 

Lowest —1 


18 


Grand total. 

103 












MmiM Tests mid Lingustic AUlity 19 

Table VIH—Compaihbon op AccoupnemiBNT op 173 Awbuicans and 103 

I-TAWANfl IN TUI VA GuADB^ AH MbASDJIBD BY TUB LrPPINCOTT>OnAl>MAN 

CLAsanoow Phoductb Sobvby Tests 
IlcAtling Continuous Pnaango Test 


Score 

DistriliUtloitv of BCOrcB of 173 Amoriotvns 
(Encli star indicalos an individual score) 

! Totals 

Highest—0 


1 3D 

8 

»4r*4it4««A*****«****«* 

27 

7 

**•««« t»**«4*««**i»*i»«**4.*«**«*«««»«« 

36 

Q 


2i 

Median—5 


24 

i 


d 

8 > 

«•«««««« 

8 

2 

1 

4 

Lowest —1 


7 


' Grand tottd... 

173 


Score 

Distribution erf scores of 103 llnlinns 
(Each fltar imlicatcs an individual score) 

Totnln 

Highest—D 

•* 

2 

8 

*+* 

3 

7 

«**«]tt*««*OiM** 

14 

0 i 


14 

Median—5 


48 

4 


28 

3 


18 

2 

■fr + t************ 

ID 

Lowest —1 


21 



103 
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TaDLB IX.— CoMPAXWSON of ACCOMPlJftHMENT OF 173 Ambbicams ano 163 
ItaI/ians in the VA Gbadb, as MBAsimBD Dy thb LipPiNCorr-CiiArMAN 
CLASSHOOM PltODUCTS S0RVBV Tj38T8 

Total Test Score 


SCOTO 

Distribution of scores of 173 Americans 
(ISaoh star indicatoe an individual score) 

Totida 

Highest—0 

1 

61 

8 

*:»#***«: 

27 

7 

l)t«#**«***if«4>‘4>*«1'*******«**«^**’l' 

31 

6 


22 

Median—6 

i,l^H,4:****t**^*¥*l** 1 

16 

4 


16 

3 1 


0 

2 

* ' 

1 

Lowest —1 


0 


Grand total. 

173 


Score 

Distribution of scores of 103 Italians 

1 (Each star indioatos nn individiml scoro) 

Totals 

Highest—9 


4 

8 

*** 

3 

7 

»*¥¥**¥*> 

8 

6 


14 

Median—6 


32 

4 ' 

¥f*^^*1H^^^¥■^^*******■¥*■*^Hll^^*¥***■*■^^^*i^*■*k■^^■ 

37 

3 1 

»*^¥*****¥************1f**4********¥'^t¥<¥ 

39 

2 1 

i,*¥******¥»**^f*¥ir** 1 

10 

Lowest —1 

$»*¥*** 

7 


Grand total... 

163 











THE EFFECT OF ENCOURAGEMENT AND OF 

discouragement upon performance 

GKOHGINA STlOla.AND GATES 

AND 

LomsEQ.itiasuisrD 

Hariuird C-oUeRe, Columbia Univoraity 

Instructions given to pei'HouH leavning to ndininister psychological 
tests usually include the advice, to pniiHo and encourage the subject 
vrhose ability is being investigated. Torinan/ for example, says: 
^'Nothing contributes more to a satisfactory tapimrl than praise of the 
child’s efforts. Under no circumstances should the examiner permit 
himaelf to show displeasure at a response, however absurd it may be. 
Exclamations like fine, splendid, etc., should be used lavishly.” 

The general assumption that encouragement Ims a favorable 
and discouragement an unfavorable effect on performance is borne 
out by an experiment of GilclmHt’.s.* Ho found that one group 
of 29 aubjecta to whom lie said, after giving a test, “A hasty examina¬ 
tion of the papers in tlic test just given h1iow.s tliat tlie members of 
tliia group did exceptionally well. I ask you to take the test again;” 
improved 79 per cent in tlm second teat, whoreas a group of 21 subjeot.s 
to wliom he said, “A hasty examination of tins papers in tho test just 
given shows tliat tho meinhers of this group did not do so well in the 
lest as the aucrape 12-ycar-ohi cAiW would do. I ask you to take the 
test again;” made a 0 per cent lower group .‘^ooro in the second 
examination than in tho Ur.st. 

In tlio present experiment an allempt 1ms been made to iiive.stigate 
further the effect of tho oxperimentor’H comments on two very 
simple performances. Tins suhjocls used wore 74 college students who 
were given individually, after a preliminary exorcise, two trials of the 
Motor Coordination (Three Hole) uiul two of the Color-naming Test. 
After taking tho linst coordination test, the first Kubject was told, 
"That is really apleudidl Do you always make such good scores? 
In a curve of distribution your Hcore would he way up hero (indicating 
a position at the top of the cnrvej. Your neore was .so good that I 
wonder if you would mind reimating tho lest?" After taking tho tc.st 

' Terman, Tj. M.; “'rhe MettHurcimml of InioWigonce." lUlU, p, 125. 

*GilcIiriat: Tho lOxtcnt to Which IVaisc niul Peproof AITnct n Pupil’B Wovic, 
School and Sociely, Dee. 2, 1010. 
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again and after performing the first trial of color-naming, she was 
encouraged similarly with words and inflections which had been 
previously standardized. To the next individual who took the coor¬ 
dination test, the experimentor said, "Oh dear, that is really a very 
poor score. I am afraid that you would fall at the bottom of the 
curve of distribution,” efco. Expressions of cliRftppointment and 
sympathy were similarly offered at the completion of the first Golor- 
naming test. To one-third of the group no comment concerning their 
performance was made; they were simply asked to repeat the test. 

Certain obvious precauidons were observed. Tho subjects prom¬ 
ised not to tell any other-persons about the experiment. They 
were asked to write down.'what they believed the purpose of the 
investigation to be. Only two suspeoted the object; the others giving 
such replies as "To find tho correlation between motor coordination 
and visual perceptions;” “To find the speed with which one can react 
to stimuli,” etc. The experimentor had had considerable proctice 
in college dramatics and tW her performance was a convincing one 
was evidenced by the genuinely elated or disappointed demeanor of 
the subjects. / 

Although the eclecftion was mode purely on tho basis of tho order 
in whioh the Bubjocts chanced to come to the experimentor, the 
three groups were found to be, as is shown by Table I, approximately 
equal in initial ability. Table II shows the average improvement of 
the individuals working under different incentives, and Table III the 
per centage of each group which improved, fell off, or remained the same. 
From Table II, it appears that the difference in average improvement 
is slight, and that the SD's are relatively large. Tlio direction of the 
difference is not the same for both t^s; in the Coordination Test 
the order from greatest to least improvoment is, encouraged, dis¬ 
couraged, repetition; in the Color-naming Test the discouraged group 
made the greatest gain, the encouraged next, and the individuals who 
merely repeated the test became less proficient. The small per¬ 
centage of improvement under encouragement (9 per cent in one test 
and 0.007 in the other) contrasts with the 79 per cent found by Gil¬ 
christ; similarly the improvoment found under discouragement 
(6 per cent and 0.01) is in disagreement with the deterioration of 
6 per cent found by the former writer. The percentage of individuals 
who improved (see Table III) is, in both tests higher for the encouraged 
than for the discouraged group, and higher for tho latter than for the 
repetition group. But the differences are again small; a change of a 
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few individuals fvoin improveinciit to deterioration or vice versa, 
would make a difference in the order of the hierarchy. 

Though the groups in Tabic IV arc necessarily small, an attempt 
fg made there to show whether the various incentives have any differ- 
ontlal effect upon groups unlike in initial ability. Tlio variations in 
the table appear to be chanco ones differing with the test, with the 
possible exception of the effect of discouragonicnt upon tlio poorest 
group of subjects. ■\VhcrcaH under encouragement, in coordination, 
100 per cent, and in color-naming 03 per cent of the poorest subjects 
improve, under discouvagement only 57 pev cent and 29 per cent make 
better scores in the second test. The average improvement of lO.S 
taps of the encouraged gro\ip contrasts with the gain of 0.9 taps of the 
discouraged group in Coordination, and the gain of 6.1 seconds con¬ 
trasts with the loss of —0.11 seconds in Color-naming. Though the 
apparent difference may well be a spurious one, it recalls another 
quotation from Torinan,* general, the poorer the response, the 
^tter satisfied one should appear to be with it.'^ Possibly tins 
apparent effect of discourngoinont partly explains some of the parts 
of Table V whore the negative correlations so frequently obtained 
between initial ability and iinprovcincnfc appear, except in the case 
of discouragement where the tendency is toward a zero or slightly 
positive relation. 

It is evident from Table VI that the effect of encouragement or 
discouragement (if such an effect exists) is not ncscossarily the same 
for the same individual in both tests. In 60 per cent of the eases, 
individuals who improved in one test after Ijeing praised or sympa¬ 
thised with, did not improve in tlie other. Tlio Hanie phenomenon of 
a larger number of indivkUuvls improving under eiicouragemout and 
discouragement than under repetition, appear in this ialilo as it did in 
Table a and HI. 

The results of this study seem to show, then, ii very slight differ¬ 
ence in average improvement or even in percentage of individuals 
who improve in the three groups. In this, the facts found are similar 
to those observed in cxporinicnts on fatigue, lack of fresh air, sloop, 
or food; the external fnetors seem lo bo of rolalively little importance 
in^dotormining tlio score. Kiieh difference as thoro is seems to bo in 
favor of oncauragemeut or dmcouvagemout rather than more repetition. 
Wo might say tlien, (with tlio usual realisation of the inadequacy of 
the data) that it is bolter to make some comment about the score than 


‘Torraani “Tho Mooauromeut of Intelligcnco.” 1010. 
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to make none; that it ia a little better to make an encouraging than a 
disoouraging remark; that relatively poor individuals arc more likely 
to be unfavorably affected by discouragement than are relatively 
proficient persons; that the effect of these incentives docs not seem to 
be constant for the two tests. The desirability of performing such an 
experiment on more susceptible subjects, ns children, using more com¬ 
plex, and more reliably measured functions, is obvious, 


TAsns I.—AvBnAQB PBUBojiMANCE OB TiinER Gaoups IN Initial Test 


Gioup 

CoorJinnLion 

Color'DamiDg 

! Avorago nuni' 
bor of tnps per 
ininuto 

1 AD 

1 Averago Umo 

1 in BGOoniis 

AD 

Encouraged. 

Od,Q 

wm 

63.6 1 

0.4 

Discouraged. 

02.6 


67.0 


Uepotition,,,... 

0-1.7 1 

Wm 

66.2 

8.7 


Tadlb II.—Avbhaqk Improvbmbnt o» Tuhbb Qbocpb 


Encouraged., 
Eiscouragoc], 
Eepetition.... 


Tablb in, —Pbucbntaqb op Subjects Who Ij(puqved, Fell Opp, on Remained 
THE Same in the Tiirbe Gnocra 



. Improved 

1 Fell off 

llcmaincd 

1 the aarac 

Coordination 

1 



Encouraged. 

80 

11 

0 

Discouraged. 

70 i 

20 


Repetition. 


28 1 


Color-naming 




Encouraged. 

68 

38 


DiseouTagod. 

61 

40 ' 


Renotit.5&n.. 

44 

48 ' 



Coordinnllon 


Ntimbur ol 
laps per min- 
ulo 

SD 

Tlmo In 
soeonUn 

SD 

8.7 

7.1 

0.4 

4,3 

5,4 


0.7 

4,6 


7.0 

-1.4 

0,1 


Color-nnminR 
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Tadlb IV. —^Tijb ICfbect of Incbntiveb •upon Groups Differing in Initial 

Abiuty 




Pcrcontago 

Porconlngo 

Percentage 

Coordination 

provcmciil 
in tniKH 

of subjects 
who iin- 

of Biibjccta 
wlio fell 

of subjects 
who remain- 


proved 

off 

od tho samo 

Enconrrtgccl 



1 


Highest 8 snhjcetx.... 

r>.a 

1 63 

; 33 

0 

Middio 10 subjcclH,.., 

11.5 

100 

0 

0 

Lowest 8 subjects. 

Discourngod 

10.8 

100 

0 

0 

Highest 7 8iibjocl.s.... 

2.0 

72 

28 

0 

Middle 0 subjocts. 

8.4 

89 

0 

11 

Lowest 7 subjcclH. 

Repetition 

0.0 

57 

' 43 

0 

Highest 8 subjects.... 

3.0 

03 

37 

0 

Middio 0 subjcclH. 

2.0 

46 

33 

22 

Lowest 8 HubjficlH. 

9.4 

87 

13 

0 

Color-naming i 

Encouraged 1 





Higliost 8 8ul)jcctH.... 

.5 

75 

25 

0 

Middio 10 subjocts.... 

X.O 

60 

50 

0 

Lowest 8 subjeolH.' 

Discouraged 

5.1 

03 

25 

12 

Highest 7 HubjcclH.... 

2.0 

57 

43 

0 

Middio 0 suhjocls. 

l.O 

07 

22 

11 

Lowest 7 sulijoclH. 

RepntiUoa 

- 1.1 

20 

57 

14 

Highest 8 Hubjcfil.H_ 

- 2.0 

37 

(13 

0 

Middio 9 Bubjcols. 

- 1.0 

55 

33 

12 

Lowest 8 subjects. 

2.5 

38 

50 

12 


Tadu; V. —Coinuor.ATioN hutwkkn Initial .-Vuilitv and Imi’hovioment 



(jooniinntiini 

Color-naming 

Encouraged. 

“33 

“24 

Discouraged.. 

“.03 

+ .IQ 

Repetition. 

-34 

“20 
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Table VI.— Relation between Improvement w One Function and Improve¬ 
ment IN THE OtNER 
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IIKIlBliRT A. TOOPS 
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luKtltutc of li^lvicationM llosenreh 
TcacKora CoUcrg 

(Continued from December) 

Other Causes of Poor School Work Besides “Molivalioth” —The 
AQ technique was not designed os a "curc-all.” But luimanit 3 ^ 
generally, sooms to prefer to rely upon one aimplo tcclinique, such as 
the AQ procedurOj than to attempt to analyze its ilia in terms of more 
complex, and numerous causes. It can thus do no harm to point 
out at this jimcturo a number of other factors that arc in the aggregate 
perhaps quite tvs important as “motivation.” This may bo readily 
shown by making evident the fact that at best the AQ procedure 
merely singles out the pupil who thereupon needs diagnosis and treatment 
therefor, or an application of more “motivation” not primarily 
designed to remedy a remediable defect. It docs not diagnose the 
muse of poor school work. 

Both Plntncr and Fvanzen ficciu to recommend the AQ procedure 
as an administrative device to bo undertaken particularly during 
the course of the academic year to discover either tliose pupils, those 
classes, those grades, or tho.se schools which are not worldng hard 
enough. For those pupils “who arc not working hard enough" 
incentive to improvement may bo provided, and a very powerful 
one, as above shown, o.spcoially when a graphic motliocl is employed 
for showing to a cliild his cflorl-s as measured by liimselF. If a class 
is found which is not doing work "up to par,” evidently the con¬ 
clusion would be that the teacher is not doing her duty, or that 
homo or other conditions arc to blame. If a grade is not up to par, 
the conclusion might bo that the textbook is at fault, the ourriculm to 
blame, or tho general educational methods of tlio grade at fault. 
And if an entire school is below par, the caufics might bo harder to 
discover. Pintnor cilea other causes of classea doing poor work such 
as: Crowded condition.s, poor aitoudanco, inadequate lighting, under¬ 
nourishment, or bad physical conditioiis of pupils. A current news¬ 
paper report of a Chicago investigation of the bearing upon scholarship 
of movie attendance discloses tho fact that of 3000 pupils in six 

27 



28 


Tha Journal of Educational Psychology 


schools, the 276 “best'* pxipils used in one week an average of 1.43 
movie tickets per pupil, while in the same week the 275 “poorest" 
pupils used an average of 1.83 tickets iicr pupil. Again, Ileavi.s finds 
high negative correlations between school achiovoment and distance 
traveled to school by rural pupils.^ 

An Analysis of the Elements Jnvobod in School Placemml, —If we 
believe that results thus far obtained from the AQ procedure arc duo 
in largo part to the ultimately less than unity correlation of mental age 
and educational ago, the educational problem then resolves itself 
into those of: (1) Proper placement or sectioning at the outset; (2) 
special educational remedial measures to provide the " tools of learning" 
(reading ability in many cases, special arithmetical processes in othem) 
for those who are doubly promoted or arc over-promoted with respect 
to their section (provided such pupils arc not to l)o demoted) for the 
present; (3) raeasureraoiit of educational product at frequent intervals 
during tho academic year, (4) remedial measures and incentives for 
those not making the “standard” amount of progress for pupils of 
that section, duo to ofclior causes than over-promotion. Tho tliird 
stop, the measurement of educational product, will uUiniatoly take 
the place of the AQ procedure. Thia follows fronv the fact that 
with adequate sectioning, tho denominator of equation (1) is practi¬ 
cally the same for all people in a given grade, since proper sectioning 
means exactly that thing: DeUmiling the range of capadiyfor prepress 
so that all pupils of a given class have approximately tho same capac¬ 
ity for progress. Thus ideal sectioning, once set going, automatically 
becomes the AQ procedure; but with the hypothesis tlio converse 
statement is not true, since the present IQ on any test cannot 
hope to bo the best possible measure of capacity for progress: Tho 
AQ procedure once set going does not become ideal sectioning. This 
suggests that in the long run, the AQ may be unable to prove in prac¬ 
tice as good a motivator as we might hope. Tho present correlation 
of EQ and IQ is certainly loss than 0.87 in all school subjects; with that 
high a correlation, the standard error of estimate in predicting EQ 
from MA is half as largo m the standard deviation of tho MA's. 
And finally we must realize that all, the dull and tho bright, by easily 
created artificial motives of which the AQ is one of tho most potent 
and satisfactory yet brought to the attention of tho educational 
world, can bo brought up to a point of educational attainment far 

( 'Reavis, G. H.: iractors Controlling Attondanco in Rural Schools. Tcachm 
Golhga CoiilHbvlioiia to EductUion, No. 108, New York, 1020, G8 pp. 
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beyond whab wo have thus far thought possible. Wribors, philoaor 
phei'B, have defined education as a continual readjustment process. 
It is a commonplace to remark upon the fact that a boy placed in school 
(sectioned) on the. basis of IQ "cIoch not stay put.'* Our promotions 
must como oftoner so that the remedial measures will be less necessary. 
It will probably always bo necessary for promotions to go by whole 
stops, by jerks and jumps, rather tlian at a steady rate. Unless 
educational ofTeringa become stoncHly moro cliIRcult in exact parallel 
with developing mentality and developing willingness to work,^ 
then educational promotion must be jerky and remedial moasiiros will 
be roquvYcd to provide the tools of learning for the promotee. In the 
army E. and 11. schools, promotions m certain courses came every 2 
weeks provided the pupil passed a certain scoro on a standard test. 
If the pupil failed to make the promotion ho was retarded but 2 
weeks. Many soldicra made three or four school grades of progress 
in reading in Q months by this scheme. This leads us to believe that 
probably the school problem where the AQ procedure is applioablo 
is largely one of initial placement, or sectioning. It is also suggested 
that a mentally bright pupil now in Grade VI but suddenly promoted 
to the Grade VIIl may not at once begin to do thorn the work required 
of him. It roquircH a distinctly higher level of reading abilily to do tho 
work of the Grade VIII than Grade VI. Not subjected in Grado VI 
to the Grade VIII reading requirement tho doubly promoted pupil is 
lacking ouo of the casonlial elcmoiits or tools for progress in the sud¬ 
denly confronted new situation of the umv grade. From this wo may 
conclude that ability to g<it along in scliool work depends upon al 
least huv genornlizecl things: (1) Willingne.ss to work, effort, motive, or 
interest; (2) mental ability, as may he mensured by some sort of 
general intelligence tost; (3) previous preparation, one of the most 
important components of which is reading ability as shown by Osborne, 


‘At the present time tho Rradrs hcoomo Biioeodsively more diffioult relative 
to mental capacity. Thw is shown by by tho facts of school elimination and 
tho higher avorago IQ'h in Hiicecssivcly more twlvuuccdschool grades. Thcro are 
praoticnlly as many indU'idnal ratrs of growth of intolligoiioo as Lhoro am nnmbcrs 
of individuals. A Hiihuol grnilo must ho ndapled in diiricuUy lo Homo given 
M.A. level, Tho people <»f this M.A. lovcl will have (liffereiifc vato.s of growtli 
of intolligcnco, so that when all may liiivo tho sutne M.A. al tho beginning 
of a aomcfltcr, ihoro will ho largo variations »l tho end of tho acinostor. Wo 
may hopo that our sclunds of tlio fuluro will come moi'o and moro to provido 
individual rates of school progress for each pupil, such na is often now attempted 
in oertnin bu8incs.s coUegeo and tradcjnchools. 
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Mrs. Burgess and others; and (4) physical capacity, If pupils were 
sectioned at tlie beginning of every semester on the basis of a properly 
weighted composite scale made up of tho four elements above, the 
present correlation between "capacity” and educational ago would 
be very greatly inercosed, and with this change in r would disappear 
most of the cases of "overworked or underworked" pupils. If 
finally, the schools were immediately to promote in school 1 year, 
2 years, etc., just os large a percentage of the mentally bright os of 
the mentally dull who are now retarded in school, and give these 
advanced pupils special drill in i-ending and the drill of previous 
preparation required to do today's problems, in a few months the 
correlation would be still further raised toward unity, tiro goal of 
the AQ workers. 

How can a bright pupil who by montnl ability "bolongs" in 
the school grade which is two grades in advance of the school grade 
which is normal for one of his chronological ago bo expected to obtain 
the educational progress of the 2-year3-advanood grade if lio is not 
allowed to go into that grade? Concretely, Johnny, chronological 
age of 10, is kept because of his ago m Grade IV which is normal for 
a ohUd of 10; he has a mental age of 12, which means that ho is a bright 
child, of mental capacity such that ho should be able to do worlc of 
about the Grade VI. Now the AQ hypothesis assumes that lie should 
have the educational ago 12 because he has the mental age 12. But 
how can Johnny get an oducatlonal age of 12 if he has not been sub* 

, jeoted, and for some time, to the kind ©/.educational content (Grade 
VI content) to which mentally normal 12-'year-old8 are normally 
subjected? It is of ooume probable that oven with ideal sectioning 
and the best remedial treatment, a few pupils would refuse to work up 
to capacity. Such cases would, however, be pathological oases, ratlxor 
than cases requiring normal treatment; for, os has been shown in ideal 
sectioning the AQ procedure becomes merely measurement of product 
for all normal (i.e., non-pathological) cases. 

Some Elements in Ideal Sedioninq .—If better sectioning, involving 
increased rates of promotion, is so desirable, how shall we secure it? 
By the use of intelligence tests, has been, the anawor of tho past. But 
these are inadequate, although far better than the system of personal 
judgment which they are fast rcplaoing. Intelligence tests, and par¬ 
ticularly group teats, in the past have meant an expression almost 
synonymous with “ability to make the average school marks as they 
are now made in our public elementary schools.” They predict these 
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but poorly. This may be a blcssmg iu disguiso os will be poiuled out 
below in our cliacusaioii of tho need for a socialized cuvriciilum. 

Accurate placement sectioning as will be shown by one of the 
authors in a forthcoming article, is best to be secured not by a general 
intelligence tcatj or by a general intclligenco test plus an educational 
progress test, but rather by a scale involving both factors and more, 
all factors being weighted ho that the composite makes up a scale for 
best predicting a specific subject abilily. In the long nm, it means that 
best educational results would be obtained by sectioning with regard to 
each separate educational process as measured by a scale of specific 
application to that educational process. It points to a possible dis¬ 
card of the “general” scales and the adoption of specific scales. Since 
general ability is a concept of “avcrngeucss/’ or ability to make aver¬ 
age school grades, and since the numerator of equation (1) is most 
general when wc include ui it the average subject-matter of many 
subjects, we should expect, on statistical grounds alone, that tho more 
general accomplishmoufc quotient will yield fewer extreme ratios than 
the more spocifio subjeot-matter accomplishment quotients. But 
subjects are not taught en generale; everywhere tho pupil recites, now 
in reading, now iu arithmetic—that is, in a speoific subject-matter. 
The teacher of uritiunctic is at present concerned mainly, if not solely, 
in how well or how poorly the given pupil can do arithmetical problems, 
and in tlic methods required in his caso to teach him to work such 
problems faster and more accurately. Of what avail for her purposes 
to know that Joliimy is up to par iu “general cJTorl” if “ho simply 
isn’t working” in arithmetic? In matters of more general adminis¬ 
trative policy, tho more general accompU.shincnt quotients may bo of 
more value. Teaching involves the problem of adaptation of method 
to tho specific individual; hcIiooI administration denis more with 
matters of general policy. Iu many individual cases, the general 
AQ as a measure of general .scliool attitude, may be of use to the 
teacher, for it may bo the key to the discovery of the lack of motiva¬ 
tion in a apeciiic scliaal Hubjcct, os, for inatanco, in the case of a boy 
of good ability who has acquired n general distaste for all school work. 
Motivation in special subjects imdoublcdly correlates highly with 
general school attitude, although higher in koiuc school subjocts 
than in others. Good conduct and good marks in the grade school 
generally carry over into the high scliool. Where lack of motivation 
is due to bad homo influences, the general AQ will bo of more valvic 
than in the case of those pupils whose maladjustment is .subjective or 
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clue to specific school conditions. In the future, we may expect much 
more attention to be devoted to methods of iGarniiig and methods of 
teaching specific subject-matters than to the problems of learning 
and teaching in general. 

The problem of what tests to uso in sectioning thus re.solvcs 
itself in the last analysis to a question of what tests will Vest predict 
the rates of ability to progress in acquiring those things which wo wish 
to he acquired. This suggests that instead of immcdiatol}'^ looking for 
an ultimate measure or rate of getting along in school work, we should 
instead look first for ultimate standards of desirability of what one 
should obtain from education in order that wo may recognize, by the 
correlations, a good measure of intelligence” when we find it. Re¬ 
searches into what we should obtain from our school studies are sorely 
needed. If we had an ideal course of study, then "ability to make 
school marks” might be made quite synonymous with "intolligence,” 
or with "school intclligonce” at least. Wo must not conclude, how¬ 
ever, that tho only worthwhile adjustment of the curriculum is the 
adjustment of its difficulty to mtclligcncc as our tests now define it, 
Tho curriculum must be worked out on tiro basis of social needs, 
whatever relation it may have with tho verbal, abstract intelligence 
needed to cope with it. Some pupils may oven bo totally unable to 
profit by tho one elementary school curriculum; for them a differen¬ 
tiation of subjects is desirable in order that such talent as they have 
may function to the best social advantage. Still others may profit 
maximally from less than tho total amount of the present ourticulum.^ 
We shall soon bo compelled to answer tho question of whether or not 
it is desirable for a graduate of the Grade VIII to enter high achool 
when the ehancea of his being graduated arc very low; that is, when the 
likelihood of failure during tho first few years of high school is very 
great. In a society such as ours, the "lowest 2 per cent” or more, 
in any "general” ability may consider tlieinselves badly misplaced, 
"undesirable,” fit subjects for replacement in our social-vocational 
scheme by improved automatic machinery, or by removal to a less 
desirable social class, by vocational elimination, or by segregation. 
These are the facts, unjust to those unfortunate individuals as, 
superficially, it may seem to be, While tho sentimentalist is trying 

‘See tUe disouasion of ‘'upijer auxiHaty olRMes,” Davis, 11.5 Tho Ubo of 
IntGlligonca Testa in tho Clasjaifioation of Pupils in tho Public Schools of Jnoksoii, 
Michigan. Twcnly-firsl Yearbook of the National Society for tho Study of Edu¬ 
cation, Bloomington, 111., 1022, pp, 134-136. 



What Shall TF« Expect of the AQ? 


33 


meanwhile to dociclc the ‘‘justice" of the cascj the school administrator 
must daily be doing something about it. 

Rapid Promotion^ as a PossibleSuhslUiite for the AQ Procedure .—^Let 
us also consider the bearing of rapid promotion upon the problems 
of motivatioiij retardation and elimination. Would not a division 
of the school course into 50 grades instead of 8 provide, by the rapid 
promotions thus possible, the motive that is now scon to be eflioiont 
in the results derived from posting or plottingthe A.Q's. Itissoworked 
out in the army schools above reported, and in a limited number of 
school subjects, it is true. But is it too much to expect that other 
school aubjeots will work out in the same way? With sectioning 
based on ability to make progress in such co.urses of the "ideal social 
values" in subjects, wo aball approach perfect correlation between 
“mtelligcnce" and school ability. Pupils able Id make tho hurdles 
frcciucntly will bo induced to stay in scliool with fewer compulsory 
education laws. Wooley has said, and has been since substantiated 
by others, that: "Failure, not economic ncccssitj", is the causo of two- 
thirds of our elimination from hcIiooI before thocighthgradeisreachecl." 
This plan means promotion by subject. Why fail tlic pupil for a somesber 
because ho fails in arithmetic? To such a question one hoars tho 
plaintive reply: "But if all tho above were dc.sirable which isn't 
yet proven, liow could it bo administered?" Tho answer is that 60 
grades docs not necessarUy mean 50 rooms nor 50 teachers, but only 
50 times for promotion in the elementary Hchool course. The depart¬ 
mental plan of instruction in being tried out in a number of Now York 
City schools. Rapid promotion carries with it its owji incentives.^ 
The administration of this ])lnu can best be carried out in a very 
large school. If such could be administered—and if proven desirable, 
then it becomes tho i)iobicm of the administrator, or educational 
engineer, to devise tho moans—it would secure maximal effort of pupils. 
A pupil in such a system enters a given grade only because his iutelli- 

^Samc ctUicatovH ivi'« in doubt coucevtung tho udviHubility ot rapid proraotiou, 
advooatiiiK rather an cx|)rinNion in topi^H covemi for tho piipilH of BU])orior nbilily. 
PrMiy,en points out in hiH fortlmoininn di«Mortntion that Uiin i» ossentinlly iiniaUo- 
sliift (lovico. l''(n' a inililiahotl flimuiKm'on of nu'iiiniuiii and innNiinuin tis 

related lo the AQ sno Murdook, K.: Tho ^VrcoinpIiHliinoal' (iunLioiiL; FindiiiR and 
Using It, 'J'cachcrii' dotlcye liccoril, Vo|. 23, No. 3, p. 22, pp. 220-23<). 

*Sojiioono is Bin'd lo liiivo cliidnd Napotoon for giving so mnny to liis 

solcliota in tlio ahivpo of incdnln. “Toya nro thoao?” roplicd tliis tliorouRh undev- 
skfindor of Inimnn nnliiro, "Well, men ilio for toys." II is possible to maintain the 
imlividunUty of modern loacliing nnd yet provide the " toys" of frequent promotion. 
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genoe;, previous training and physical capacity arc such os to enable 
him to do successfully the work with pupils of his own hind; failure to 
gain, promotion would be self-admiUedly deserved social disgrace. 
Piit ,what is not possible in a modern metropolitan school where the 
humber of students is thousands instead of hundreds?'' And finally, 
4S 'ifc too much to expect that our educational administrators wifi 
Ttrork out plans for elmimlionfrom school hy subject? Cannot Johnny 
some day, under an enlightened educational system, hope to drop 
Atithnvetic only, instead of dropping out of school entirely? 

, Wc have discussed the value of the AQ as a “motivator," there 
are other methods of motivation fully as potent. 

Some Nev) Aims of Education and New Emphases in Methods Suff- 
gested hy the AQ Procedure^ —The need in education for a measure of 
acoomplishraent, achievement, effort, motivation, attitude, dynamic 
effect of educational environment—call it what you will—is so great 
that it seems quite possible that the AQ procedure may bo accepted 
unoritically by educators. Especially is this true iu view of the 
fact that when the above methods are applied to large groups of 
pupils in a school system a very largo majority of dull pupils are 
shown by the indices to be working hard, wliile a very large majority 
of bright pupils are shown by the indices not to be working hard, 
“f/i© bright pupil is the lazy one" might easily becomo an educational 
shibboleth for quickly, and without proper consideration of what 
is likely to be the outcome, changing the point of emphasis of the 
teacher from the present one of paying particular attontioa to the 
dull pupils to that of paying particular attention to the bright pupils. 
The adoption of such a schomo in the school system would evidently 
give the teacher the necessary “scientific foundation" for relaxing her 
efforts with those unfortunate retarded pupils who popularly are 
already working “beyond capacity and will never come up to par in 
school work" and would allow her “conscientiously" to recommend 
them for vocational courses. This is highly desirable—granted, 
as it is very unsafe to do, that it can be assumed at once that such 

*Iq Now York City in 1021-22 tlioro wore 30i olomontnry schools with a 
population of 1000 or over; 164 with 2000 or oTOr; 30 with 3000 or ovor; 6 with 
4000 OT over, and 1 with more than 6000 pupils. 

’ The thorough etudont ol mental should road Bagloy, W. C.; Educa* 

tionol Determinism: Or Demoornoy and the IQ, School and Society^ VoL 16, April 8, 
1922, p. 373, This should bo followed by the reply, Torinan, L, M.: Tlio Fayolio- 
logicd Determist; Or Democracy and the IQ. Journal of Educational Research, 
Vol. 6, No. 1, 1922, pp. 57-62. 
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dull pupils cannot, with proper "effort/’ achieve a socially acceptable 
minimum of general education but can achieve a socially acceptable 
minimum of meohanical or vocational education. Again, the general 
adoption of auoh a mcoauro might be expected to cause tho teacher to 
redouble her efforts with the **lazy bright ones" who "are so much 
nicer to work with anyway/’ tho geniuses of tho next generation. 
This likewise ifl desirnblG—granted, na is Jikowiso unsafe, that high 
AQ’s in academic work are a molbod par oxcellonco for developing the 
latent capacities of "genius.” 

If the AQ procedure results in mentally retarded students being 
advised in greater inunbors than formerly to enter vocational courses, 
some good results will be obtained. Stenquist has shown that the 
correlation between mcohaiucal ability and intelligence is not very 
high. This permits of a goodly number who are below the average in 
academic school work yet being a6ou« the average in mechanical ability. 
It emphasizo.s the adjustment phase of education; that education is 
not only individual adjustment, but readjustment, and makes more 
apparent tho place of vocational education in a general educational 
system. Yet it must bn noted that tho corrolations between intclU- 
gonco and mechanical ability, while low, are always positive. This 
moans that thoro aio a con«idorable number of people in tho world who 
are neither good at goimral cdiication nor at mechanical work. Educa¬ 
tion cannot escape its duties in providing for theso unfortunates by tho 
AQ route, a process of "hand them on” if they work to capacity in 
academic educalion and yet get nowhere. A good axiom for school 
administration is to keep the cliiUl moving ns long na he is growing 
mentally. It is ab.surd to keep a child in a grade 2 years, for "repeat¬ 
ing work deadens intoi‘c.sb and spoils ambition.” As long ns the 
child is developing rncntally there i.H flomethiug lie can learn; tho solu¬ 
tion lies in tho line of differential rates of progress and difforential 
curricula rather than in stricter com])ulsory education laws. "Where 
shall the vocational school hand on the people who are neither good 
in academic work nor in incclmincal work? 

There is a still more soriou.s count against the AQ procedure. 
The AQ implying perfect correlation of K(i and IQ, when applied to 
specific school subjects, would seemingly porpotuaio tho academic 
Btandard upon the most specific of Kwbjccts to which this procedure 
may bo thoughtfully or thoughtlessly applied. Without caution 
we may bo led to disregard tho vast body of empirical evidence which 
points to a specialization of many abilities—lack of perfect correlation 
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among themselves—in different school subjects. If we arc to use tli( 
AQ we must soon decide whether people studying art, mechanics, am 
automobile repairing, shall bo required lo come up to 1.00 AQ in those 
subjects; when obviously there may be little correlation betweer 
the EA^s in tliose subjects and MA's dclcrrainecl by our prescnl 
intelligence testa/ 

G. I. Gates, following the suggestion of Hollingworth, has con¬ 
firmed the belief that the eorrelation between fimctiouH increases as we 
‘ approach the limit of practice in each. This lends strength to the 
hope that the correlation of EQ and IQ in the ordinary grade school 
subjects may be nearer unity than it now is. If pupils were ideally 
sectioned and subjected to an ideal course of study, promoted just as 
fast as ability to progress would warrant (absolutely individualistic 
promotion) then we could provide some different sort of graphic moti¬ 
vation process which would keep pupils working up to capacity. 
If we can increase the correlation of EQ and IQ towards unity, 
extreme AQ's will automatically tend to disappear. If pressure is 
continually applied to schools whose AQ is below 1.00, in the long run 
higher standards, new norma, will be sot for our school work; ihore will 
thus always be 50 per cent who will need special rotnodial measures to 
bring them up to the ascending average, or norm. This would tend to 
perpetuate and increase our present academic trend rather than aid in 
discovering the social objectives of the curriculum. The present trend 
is good with reference to the post; it is poor with respect to what the 
future should demand of education. It is evident that ultimately we 
have need for absolute standards of school proficiency rather than the 
annually changing relative standards of position within a test score 
distribution, or arbitrary “linear” scale based upon ago or grade 
averages, which we are now employing. We need tests of arithmetic 
of the objective nature of “What is the average length of time required 
by Johnny to multiply two-place numbers by two-place numbers with 
an accuracy of 99.9 per cent?" 'With such tests on hand, we shall be 
ready to abandon all annually varying I'elative measures such as 
sigma, percentile or T-soore position, percentage or letter grades. 
With such ^absolute standards, tho “passing mark” will bo x per cent 
of what is /‘essential,” whether that bo attained in 2 months, 10 

^ Fran;5en states that tho AQ pToceduco ia valid in "otwh of tho elemonfcary 
school subjects,” by which he probably iraplics only tho conventional reading, 
arithmetic, history, geography, etc, Ho tested out only ftrithni.otic, reading and 
language. 
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months or 2 ynars. An important advantage for the improvement 
of tcacliing will be secured, for Ihon if two teachers have pupils of 
the same average ability for progress, the product will bo a very 
objective measure of the teaclier's ability to induce motivation and 
to secure results. 

If the AQ method brings the increased flexibility of the promo¬ 
tions above shown to 1)0 so desirablo, if it results in newer and increased 
emphasis upon more ndoquato Kcctioiiing at the beginning of each 
semester, if it malce.s it more evident that a child newly promoted 
may be deficient in tlie tools of education at that point, if it provides 
the real incentive wiiich it promises in making lazy pupils work, then 
it is a very real and wortlry object of our admiration. Herring and 
Gates have found that individual AQ’s in a specific school subject can 
readily be raised to 1.35, or muIUplca of that figure, if based upon 
a general norm of all Hcliool Hystems. The truth is that probably 
everyone can work harder than he now docs. TheAQinitsphilosophy 
does not connect itself with the problem of whether or not a child is 
doing more than he is physically to be expected to do. This is a prob¬ 
lem of which wo know only too lillle. In addition to giving many 
test-i, let us dotormine some of the objectives of an educational system 
baaed on social luiods, and then devise more spccifio tests which shall 
insure the attaiiimeut of those objectives. Some will object by saying 
that an IQ has certain merits in its very generalness as vs. the specific¬ 
ness of specific scales for predicting specific subject matter abilities. 
The IQ may correlate fairly well with capacity to progress in reading, 
arithmetic, geometry, with ability as ofliccr in the army, as execiitivo 
in indvifitry, and many others. A specific weighting of those tests 
would predict any one of these abilities better than does the general 
weighting, but once m winghtcd would ai)ply well to only the one, or 
at best two or three of the nbilUios. It is true that we may be forced 
in certain cases to retain the IQ procedure for reasons of economy, 
oven though specific weighting.s would give better predictions of more 
specific abilities. It should be noted that, as soon as we decide upon a 
"criterion of inL(‘lligone<j” we eeiiso attempting to inGnsure general 
intelligence; wo uk? then attiuiiptiug to measure a Hpncific ability—• 
lhai abilily which the criUtrUm mcaHiircs. 'J’luis the deUu'ininutioii of the 
criterion—what we are trying to measure, oxproasesd in concrete 
objoctivii units-logically coni(5s iir.st, and tlio construction of tests 
second. Without hl.storically having luul the tesUs first as wo have had 
in the past wo would not nndize half as keenly as now the need of a 
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cHtorion. Thus our work as Bcicntists logically leads us to become 
philosophers. Is it too much to hope that flcieuce will prove our best 
tool in our new role of philosopher? 

An EA measured over a socialized curriculum, rather than aa 
“effort^’ measure, is the measure of ultimate worth of school attain¬ 
ment. Wo should not lose sight of the fact that an IQ of 0.70, working 
at an eiRoienoy, AQ *= 1.00, is after all acquiring educational assets 
which will be valued by the world os only EQ’s of 0.70 are valued. 
When such a boy with EQ of 0.70 becomes an adult, society primarily 
values him for his physical brawn and industry, and not for his intelli¬ 
gence" or “education," 

Thus our consideration of the AQ has solved notliing, although it 
has raised many interesting problems. 
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One of the advantages, if not tho chief advantage, of the group as 
compared with the individual teat of intclligCDcc is the saving in time 
where large numbers arc to be examined. A class of 40 or a group of 
several hundred may be examined in the time whicli it might ordinarily 
take to examine any one of tho group individually. To secure this 
result with as little loss as possible in the reliability of the individual 
finding is, of courao, one of the desiderata of group testing. Experi¬ 
ments have indicated tlmt, other things being equal, the longer and 
more varied a test, tlie greater its reliability. In view of the time 
saving in tlio aggiogate as compared with the individual tests, it 
would seem fair to allow n generous allotment of time to the group 
tester. Hia teats may bo given in aovornl sections and in short periods, 
and the added returns will furnish the experimental basis for the 
selection of tlio most offcotivo te.st8 for a more abbreviated series, if 
that proves desirable, Tho Dearborn Group Intolligence Teats were 
planned by tlio author with such considerations in mind. It was 
expected that they would be found long, especially in the hand of 
untrained and inexpert examiners. In the experience of the writers, 
when tho duration of the periods of testing has been properly con¬ 
trolled they have not proved irksome to the examiners, much less to 
the examinees. Statistical studios have, however, shown' that an 
abbreviation of the tests is possible without changing materially the 
results at present obtained. The present revision has, therefore, 
eliminated certain of the testa. A subsequent revision might well 
proceed on tho basis of replacing those tests by others with the purpose 
of further extending the usefulness of tho scale. 

Tho basis for theso eliminations and for various minor changes 
is described below. 

The examiners and scorers had noticed that a number of the tests 
in the First Series were so easy that they seemed of little value in 
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cUfferontiating pupils of various intelligence levels. Ifc was detei'mincd 
to investigate this point, and so the scores of each separate test in 
Series I wore distributed by half-year intervals, and the medians were 
found.* The results of this study appear in Table I, and they show 
that some of the tests do not contribute especially to the diagnostic 
value of the examination above the early years. This fact is even 
more strikingly apparent when expressed graphically as in lOingram 1. 
Here it is seen that the maximum score on the Color-form Test is 
reached at an early age, and that there is no significant difforenliation 
after the seventh year. On the other hand, there arc continued incre¬ 
ments of the Page 2 score, and the maximum is nob reached by any 
considerable portion of the pupils of the twelfth year. Since these 
latter tests and others in the Sciics appear to furnish sufTicient differ¬ 
entiation in the early yearn as well as in the later years, il wan decided 
to drop the Color-form Tests. The writers would suggest, however, 
that a series of tests heghming with such a one as the Color-form Test, 
or a simpler test, each of which reached plateaus at siicccssively 
higher levels might furnish a better scale than a combination of tests 
which have a larger range of age differentiation, such as is no^v found, 
for example, in the Opposite Completion Test of the Second Series. 

It would be possible, of course, to use the Color-form Test by 
increasing the number of items or by decreasing the time. The 
former plan, however, would lengthen the tost, and tlie time limit 
would have to be reduced to something less than half a minute to get 
a proper differentiation on the test. This would make for inaccuracy. 

Other considerations were token into account in the elimination of 
some of the tests. The Map Test, for instance, is a very good test 
for the differentiation of the older pupils, but it was omitted from 
the revision because teadrere found the scoring rather difficult. 

After the revision of the examination, the scores on the revised 
form were found, and were correlated with the original scores. In 
one town, whore 532 childreii in the first three grades had boon tested, 
the Pearson Coefficient of Correlation was found to be 0.97. In 
another eomraunity where all grades through the eiglitli wore tested, 
a^ group of 822 clnldrcn under 13 years of ago gave a correlation of 


* In geleetiiig the scpurntc tests, pages I nnd 2 of GI worn cucli taken ns a 
whole. No smaller division seemed worth while, ns encli of those pages is made 
lip of a minibor of short problems, no ono of which receives a consiclorablo score 
except the Ball and Field Test. 
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0.99 on the two forms.* These high correlations indicate that the 
short examination -will give the same results as the longer form. 

The figures and diagrams here given present the results in only one 
community. As was pointed out in a previous article, results on the 
Dearborn examinations were not lumped, but were worked up 
separately. Other results bore out the findings here presented. 

Scries II was treated in exactly the samo way as Series I, and 
with similar result.^. It will be unnecessary to reproduce the Table 
giving the separate teat age scores, but Diagram 2 presents part 
of the evidence graphically. Here wo see that the Opposite Comple¬ 
tion Test is a very good one for the purposes of the examination. 
It has a possible score of 34 points, while the IG-year-old children in 
school score on tho average 26 points only. There is no large bunching 
of scores at tlic upper end of the curve, and the test would probably 
differentiate a year or two farther of the selected ago groups in school. 
On the other hand, tho Mazo Test, which has a maximum of 26 
points, sliows a median score of only 7 for the sixteenth year. This 
teat took considerable time to give, yet it turned out to aid very little 
in dilTorcntiation. 

The Memory for Digits (Ladders) and tlio Picture Symbol Tests 
gave similar results. Tho median score of the 16-ycar-olds in the 
Memory Test was only 7 out of a possible 10, while with the Picture 
Symbols they got only 8 out of a ))os.siblo 20. 

Tlio dropping of tlm throe toste just discussed so decreased the 
time that it was not ucco.s.sary to make further deletions. This made 
it possible to keep the ifiau of having two parts to the tost, a practice 
which seems liighly dc8iral)lc, as it is thus possible to sample tho 
individual's work at two different times. 

Correlations were made with tho scores on each form of the exam¬ 
ination grouped in 10-point intervals. The tests had been given 
to the children from the Grade II through Grade XII in three diff¬ 
erent communitios witli groups of 1023, 1430 and 1821 cases. The 
Pearson Cocfiiciciits woro 0.99, 0.90, 0.90, whioii show that thero 
was practically 110 loss in accuracy from dropping the three tests. 

It was also found desirable to inako a few slight modifications 
in the teats wliicii ivoro retained. Tlioae changes do not alter the 

•Tlio inctliod of grcnj|)inK Uio Hooren may bo in bo mo moHHiirc icsjjonsiijlc for 
thoso high coofficicnla, Hem-os on tho first edition woro group[!(l by fives (0-4, 
6-0) and those of tho roviacd examination by lens (0-9, 10-19). Maximum scores 
are 100 (300 divided by 3) on old test, and 182 on revised form. 



42 


* The Jownal of J^diicational Psychology 



DfAonAM 1.—Comparison of sooros, by balf-yooTs, of two Sovjob 1 Tests. 



DuanAM 2 ,—Comporison of scores, by ngo, of two Sorios II Toata. 
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tests materially, but aid considerably in the giving of the examination 
and the correction of the papera In the First Series a cat has been 
added in the second item, and directioim are now given to draw a dog 
chasing the oat. A number of “boxes” ai*e added in places where the 
child is to write numbers. The coins have been enlarged to very 
nearly thoir actual size, and the field for the Ball and Field Test 
has been turned about a bit. 

There have been a number of objections made to the tests on 
pages 1 and 2 of Series 1 because there are no definite time limits for 
them, and because in some inst^oes the judgment of the examiner is 
required in scoring. In spite of these objections no change was made 
in this part of the test. As the two pages in questions arc filled 
with relatively short questions the time limits would necessarily be 
very short, and the examiner would bo forced to pay more attention to 
his watch than to the group. A loss of spontaneity would result which 
would be highly undesirable, and almost disastrous in the lower grades. 
The accuracy of measurements obtained under such conditions would 
be highly questionable. Another point to be held in mind is the fact 
that the tests on those two pages are largely adapted from the Binot 
Tests which impose no time limits, but allow the subject to work at his 
own speed. Many of the performances upon which the judgment of 
the examiner is required, such as copying the diamond or square and 
drawing the star are modeled after the Binet where practically the 
same method of scoring baa been used with much uniformity. 
Furthermore, there have appeamd recently some rather weighty 
criticisms of examinations which are composed entirely of ‘'objective” 
tests.* For these reasons the writers feel that tho retention of 
the tests in their present form is justified. 

The second part of the clock question, in which the children 
were asked to put hands in clocks to show various times, had given 
considerable trouble. In responding to the direction: “Show how 
the clock looks when school begins in the morning,” the children were 
likely to record tho time when the school doors opened, the time when 
they could get into their rooms, the time when they were expected to 
be in their seats, or some other such time. Also, the sunrise and sunset 
questions were hard to score. To obviate these dilRculties the first 
three questions call now for definite timesj six o’clock, half past four, 


*Sco, for example, Robaok, A. A,: Subjective Teats vs. Objective Testa, 
Journal of Educational Psychology, Vol, XII, No. 8, Nov., 1921. 
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aud quarter of two. Centers have been added to the clock faces to aid 
the children in their drawing. 

Teat 17, the First Substitution Test, is given a slightly longer 
time in the revision. There were no perfect scores on this test, even 
among the eighth grade pupils, and it was thought that another half 
minute would produce greater dilTcrcntlation among the superior 
pupils. This was desirable because of the omission of tho map ques¬ 
tion, which had been the chief aid in differentiating the superior 
pupils. 

The Picture Completion Test (No. 19) in the flrst edition con¬ 
sisted of fifteen items and covered three pages. It was found that 
these items fell into groups according to their difficulty, and for tho 
revision one or two items were picked from each group. This gave a 
test of sevoii items, and tho size of the pictures was slightly reduced so 
that all could bo placed on a single page. Two practice problems were 
included, partly to compensate for the practice effect which was 
present in the longer teat, and which probably aided the pupils in the 
solution of the later items. In order to keep the scores comparable, 
two points wore assigned to each item in the revised form of tho test. 

Although the map questions arc omitted from tho new form, tlio 
map is reproduced for use in the Bulcr Test. 

The retained Scries II Tests also underwent slight modifications. 
In Test 1, the Picture Sequences, it had been found that a few of tho 
pictures were very often misinterpreted, and theso were changed. The 
pages have also been improved by putting the practice pictures on 
the previous page, thus leaving room for blank spaces between adjacent 
seta of pictures. The time of this test was reduced from 7 to D minutes, 
as it appeared that the shorter time gave ample opportunity for all the 
pupils to do all that they were capable of doing correctly. 

In the Word Sequence Test, also, there were a few items which 
were somewhat ambiguous, and these were changed for clearer forms. 
The time limit of this test was also reduced from 7 to 6 minutes. 

The Forinboard Test had one of tho figures out of proportion, and 
this was redrawn con’ectly. Also, the figures which are used for 
demonstration were lettered so that it would be easier for the exam¬ 
iner to refer to them. 

No change was made in the Opposite Completion Test (No. 4) 
except that the time limit was lowered to 7 minutes. 

The time necessary for the Proverbs Tost was cut down by requir¬ 
ing only the first word to bo numbered. A second example was added 
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for demonstration, and one or two changes were made in the tests 
at points which had given difficulty. There was some thought of 
dropping out tho part of this test in which the children are asked to 
explain proverbs in their own words os the scoring of this part of the 
test cannot be made as objective os the rest of the tests. However, 
this part of the test proved excellent for picking out the very superior 
individuals, so it was finally retained. 

A number of the drawings in the Faulty Pictures Teat proved very 
unaatisPactoi’y, and so the upper half of tWs page was redrawn. No 
change was made in the time limit. 

In the first part of the Number Puzzles a correction was made 
in the perspective of the lost diagram. The time for this test was 
reduced to 2 minutes, as it was found that practically everyone, even 
in the lower grades, could finish within this time. In the second part 
the demonstration examples were changed by leaving out tho sums on 
the cplumns and rows, and a figure was omitted from tho first of these. 
The two problems which had more than one possible solution were 
ohanged. 

These revisions have made of both scries considerably more con¬ 
venient instruments for both tester and scorer. There is little loss 
of the range of variability, except among the very superior pupils in 
Series II, and discrimination at tho extreme upper end of the scale is 
not ordinarily of great practical importance. Very convincing 
evidence will be presented in a fortiicoining article that in these tests 
tha average adult age is approximately 14)^ years. Tho differentia¬ 
tion of superior adults is made up to intelligence quotients of 140. 
Further differentiation of tiie small number of cases beyond this level 
would ordinarily be made by individual examinations. 

The saving of time is especially valuable in Series 11, as either part 
of the revised test can be given in the ordinary high school period. It 
is possible, when a rapid survey is necessary, to use Examination A 
alone in the first three grades, and Examination C in the others. The 
correlations of these separate parts with the whole examinations ai‘e 
high, running well over 0.90 in a number of communities. This 
practice is not recommended, however, as the more teats we have 
the more likely we are to avoid the contingency that a ohanco failure 
or success in any one test might misrepresent the individual's true 
capacity, 



notes on articles in educational 

PSYCHOLOGY IN CURRENT ISSUES OF 
OTHER MAGAZINES 


IlliPORTliD BY GLIGILJS COLLOTON 
Department of Educational Psychology, Tho Lincoln School of Toachers College 

iN^reiLioEiircB Tests 
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Notes on Racial Differences. A. M. Jordan. School and Socioty, 1922, Oct. 
28, 603-604. Data collected at Fort Smith, Arkansas show pronounced difteronces 
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vided for. 

Adjustments in California. Will C. Wood. Journal of Educational Research, 
1922, November, 320-331. Big problems in educational resoaroh that educational 
testers can help to solve. 

Stathlical and Non-alaliatical Interpretation of Test Results. Samuel W. Fcrji- 
berger. Tho Tsychologicnl Clinic, 1922, Mny-Juno, 08-72. Diagnostic value of 
tests more important than final test scores. Non-mathcmatical intevpretatiori 
of results more significant than stniistics. 

More Aceurule Uso of Composition Scales. Edward William Dolch. Tho 
English Journal, 1022, November, 530-644. Points out three distinct bouvccs of 
inaccuracy in tho use of English composition scales and suggests ways of meeting 
lliesQ diniouUic.s. 
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Udn^ ike Re^xdls of Mcci&wetnents in Readiuc in Training SludcnL-teackcvs, 
M. Elizabeth James. The Elementary School Journal, 1032, November, 100-196. 
Original exercisca in checking comprehension and improving rate in reading. 
Bemcdial measures used in individual cases described. 

The Conversion of Test Scores into Series which Shall Have Any Assigned Mean 
and Begree of Dispersion- Clark L. Hull, Tl»c Journal of Applied Psychology, 
1022, September, 20S-300. Describes a fotitmla for making scores from n number 
of teats directly comparable. 

A Melkad For ihe Study of Vocaiional Interests. Max Krcyd. Journal of 
Applied psychology, 1022, Septoniber, 243-254. A detailed description of two 
questionnaires for studying the interests, likes and dislikes of an individual as a 
dynamic basis for vocational guidance and selection, 

A Comparison of the Achievement of High School and University Students in 
6'ej'laiw 3’as/w in. Chemistry. S. R. Powers. Journal of Educational Ucscaroli, 
1922, November, 332-343. Students of chcmistiy in the larger high schools do 
practically aa good work on this particular test na university freshmen. 

The Homewood, DeinonsCralion School at Johns Hopkins University. M. Rose 
PaUcraon, School and Society, 1022, Nov. 18, 577-684. DoscriiJtion of the con¬ 
ditions under which a half year’s work was coverwl successfully in 2 months. 

A Comparison Test for Jnvestiffaling the Ideational Conient of the Moral Con¬ 
cepts. li, A. Broteinarklo. Journal o/ Applied I^ycliology, 1022, Soptember, 
236-242. Description of a test designed to mcesuro tlio background of "moral 
concopte" o! an individual. Ileaults of the tests of 331 college students compared 
with their quintile tanking on othev tests. 

37w Pfii/cfeolodj/ and Pathology of Personality. Vernon M. Cndy, Journal of 
Ddinquoncy, 1022, September, 226-232. "A summary of tost problems and 
bibliography of general literature.” 

When Children Read for Fun. Jenny R. Greon. Scliool and Society, 1022, 
Nov, 26, 014-9J0, Reports an experiment with 600 sixth and govouth grade 
children to find correlation between reading ability and material chosen when 
“reading for fun." 

An Aneienl Score Card. Elmer E. Jones. Tl\e Peyohological Clinic, 1922, 
March-April, 2Q-36. Description of a score card used 20 years ago. Its pre¬ 
dictive value ns shown by the lotei' nchlovomcnt of individuals wUobq cards ato 
reproduced. 

Diagnostic Teaching, M. Alioo Weir. The Psychological Clinic, 1022, May- 
Jiino, 116-122. A ease study. 

A Study in Grades and Grading under a Military System. Robert L. Pates. 
Journal of Experimental Psychology, 1023, October, 320-837. Describes the 
rigid system of grading in rjse at the Virginia Military Inatitnto, I.exingfcon, Vn. 

An Sthical Dtscrijnfnaffon Test. S. 0. Kohs. Journal of Delinquency, 1922, 
January, 1-16. A new battory of tests to moasuro other qualities thair intoUi- 
gonce. Two tests are now—othem arc new compilations of old material. Not yet 
stondardized. May be used either as individual or group test. 

Individual Vanabilily in Test Performance. Morviu A. Diivcn, Journal of 
Delinquency, 1922, March, 80-08. Dqtfuled study of 150 caaea—50 at oaoh of the 
mental ago levels, 9, 10 and 11. CorapariBons of four types of tesla-Stauford- 
Binet, Performance Teats, Aasociation, and Literacy. 

Home Conditions and Native Intelligenee, W. W. Clark, Journal of DoUn- 
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qucivcy, 1022, Jniumry, 17-23. Tho W]uUior Hcala used foe grading homo con¬ 
ditions of boya coininitled to aUtoimlualrmlschool. Fout tnbloa compare general 
quality of h03nc.s nnd IQ's of hoys. 

The functions of Tesla nnd Scales aa Found in Recent Edncalional Litaralure. 
A. G. Cfippa. Jouninl of l-hUicntional Rcsrni-ch, 1022, October, 201-208. Reports 
an iiivcstigfttioii of fiO (lifTorent cdiienlionni hooks ajuI magwjiitic articles, Detailed 
tabulations nnd suininiirics of thu Uipies noted in tho litio are given, 

The T'hcory of DiJfcrcnUnl ICdiicatmi as Applted to Handicapped Pupils in the 
Blemenlaru Grades. K. Wnlince Wallin. Journal of Ethicntional Research, 
1022, October, 200-221. Hecoinimnids an organization of special classes for dofi- 
oioiit or defective piipila. llhiKlnilea tho application of buoIi clnaa organization, 
Plonda for trained workers and a careful use of tests. 

Calculation and InterjtrPtalion of Perccnlite Itauks. L. L, Tluirstono, Journal 
of Educational Ilcsoiiveli, 1022, (Vlohcr, 225-235. Dc.soriplioii nnd oxplajiatiou 
of tho percentile curvii nnd its graphical ealcnlalion. 

The Aoe-grado filalus ns an Index of .SVftool Achimmenl. Charles L, Harinn. 
Educational Adndiiintration and Ku)>ci-viHiou, 1022, October, ‘tlS-423, Mental 
ability aa shown by intpUigence IchU and Kckoul achicvoinc»\t as sltown by edu¬ 
cational tesla betti'r linacs for clossUicntioii than ngo or years in school. Eight 
tables givo detailed data, 

Anofficr Educational Campaign. J. \V. lUchardsoiu Journal of Educntiontvl 
Research, 1022, fioiitonibor, 07-101. An cxpcdiucut in tlio teaching of spoiling 
and its results, 

Tho Varinhilily of Teachers' Marks. Xnthan iSilborstoiii. TJio English 
Journal, 1022, Hcptoinlior, 4M--I24. An analysis of the rntinRof seven questions 
on a Regents exaniinatidn i)n[>cr Uy 31 luciiibcra of nn IDitglisli dopm tiuont. 

Tho Psi/ehuhgi/ of Stdf ing Pusth Frohkm. Joyce E, Matlior nnd linns W. 
lilino. Tlifl PfidagiiKlcnl Scfoinnry, 1022, Scplcnibor, 2fiO-2K2. An experimental 
study of reasoning, using iiippimnipiil piixalcs »» inatcrinl, Full details ns to pro¬ 
cedure nnd results. 

The JicUabiliti/ of Judoinvul of Personal Traits. John Slawaon. Journal of 
Applied Psychology’, 11)22, .Uimt, 101-171. Twenty-five Icnclicra in each of six 
schools were given iiidcpciulpnl raliiigHiii 11 pcmounl trails by ronr or more judges. 
Results are studied in detnib 

Computing lulurcornhilions of Tesla on the AtUHtig Machme. Ilorbort A. Toops. 
Journal of Applied Psycluilngy, 1022, June, 172-18'1. Details of a procedure for 
tho rapid calculation of inlercorrclntUniB. 

JJonkinp jSliidcnts/re??! Their lAleral Grades. J. Charles llathbvin. School and 
Society, 1922, Sej)!. Iti, 3211-335. Proponea a incUiod of evaUmting grades ou tho 
basis of tho uonnal iwolmbiUty «»rvc. 

iStandardfzah’oa o/l/ie \VI\ip})lc-[Ieal!f Tapping Test. Amy Hewca. Journal of 
Applied Psyehology, 1022, June, 113 -110. DeserihoH in detail the establishing of 
ago standards for a tost nu'iiHiiriiigMiici’d mul acourncy of oyo-hnnd coordinations. 

The Reconsleuction of /JcmfirrMcp. (Jeorgn 1). (lutlen. School nnd Society, 
1022, Oct. 28, 477-181). 'I’lin vahio of an iiilcllculual arislocinicy. Tho importance 
of the general ami norinnl tmiiiiiig of the ‘'iiiontally nUrst." 

Facing Iho Fuels. W. L. ]‘RliiiK(‘i'. Snliool ami Society, 1022, Nov. 4,505-612, 
Doaoribos the Now VorJe Hcbool Nilualimi with roganl to ages of ptipHf, rales of 
progros-s, nohievoinonts of clfi.s.H gifiiipH, oln. Nnod for hotter classificntion. 



NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 
EDUCATION 


l. A Textbook ofPsychologi /.—The author^ of this book emphasizes 
its soientifio charaoter oven to the extent of including the term “scien- 
tifijc” in the title. Again in the preface, we are told that this book 
represents a definite departure fi-om tradition. Just what tradition 
is meant, tJie reviewci* finds it difficult to imagine. At any rate, the 
book itself does not differ from the conventional textbook of psychology 
in any moterial way. We find the usual lengthy treatment of the 
senses^ sense perception and the nervous system. Indeed, fully two- 
thirds of the book is devoted to these topics. Scattered throughout 
the pages, we find several instances of our present-day popular psycho¬ 
logical'game of baiting tire iutvospcctionist, which to the loviewev, 
seem out of place in an elementaiy text. 

The author, however, is no bchaviorist, since ho describes psy¬ 
chology as *‘a science of the conscious responses of the organism.'' 
He dwells upon the vaiue of psychology in furnishing materials appli¬ 
cable to the problems of physical science, education, industry and the 
arts, and to social problems. Little attempt throughout the book is 
made to al\Qw the student the applications of the various topics dis¬ 
cussed to the above-mentioned fields. 

In the chapter on thought and thought content, there is much 
discussion of "imaginative types," such os "the visile, audilc, tactile, 
gustile, oUaefcile and motile.” Because "workers in mental measure¬ 
ment are for the moat part persons of intensely practical bent," they 
have neglected the study of imaginative types. To discover the 
various types, we must use the method of simple report, which begins 
"Imagine a pan of onions frying on a stove." And then follow the 
usual questions. 

The book is interesting, and the discussion of sensation and per¬ 
ception is detailed and well done. It will not prove an easy book for 
the beginner. For the student of education or the educational psy¬ 
chologist, it has nothing of particular interest. 

_ E. E. 

' Dunlap, K.t “The Elements of Soiontilio Psychology.” Mosby, St. Ixiuis, * 
1922, pp. 308. 
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2. A Booh about Teats.—Thh is tho age of standard tests, educa¬ 
tional and intelligence. Teachers, superintendents and others are using 
them by the million, and are finding them instructive and useful for all 
sorts of purposes. Tho number of boohs about tests is very small and 
a new one is, therefore, welcome. The authors of the present volumo*^ 
have tried to explain in simple tcrniB what standard tests are and for 
what purposes they should bo used. They devote a chapter to a 
description and discusHion of the chief teats in cncli subject, such as 
arithmetic, reading, and the like. Two chapters are devoted to 
inteUigcuco tests and naturally the treatment has to be brief and 
sketchy. A glossary of tcrmK at the end of tho book should bo usefvil 
to the beginner in the Bubjeel. 

The book cornincncls itself to the reviewer for its simple and pleasant 
stylo. It should prove oxtroiiioly useful to the thousands of teachers 
who are aiLxioua to know what this testing movement is all about and 
who have not the time or incHnalioii to delve deep into tho subject. 

R. P. 


3. Tho Recorded ActivHica of Six-j/ear-olda in an Experimental 
SchooU—An unusual conception of the nature and method of curricu¬ 
lum reconstruction is cxornplificd in this booklet. After questioning 
tho fundamonlal assuniptionH underlying current practices and points 
of view, these workers luivc (•iideavoved to blaze a new trail. The 
significant fact in this connection is timt tiiey have kept a careful day 
by day record of the rcmdling Hckivilio.s, and thus made available a 
mass of data noccasary to the cvnhintioii of their work. 

The necessary inatciriHls and furnishings arc listed and a typical 
program is presented. Then the record of each week's work is given 
under the following UetulK: ([) Play experiences, (II) Practical 
activities, (III) >Special training, and (IV) Organization of informa¬ 
tion. Under tho tJiinl beading the reader in surprised to note the 
emphasis on number experience.^, the formal nature of some of the 
special training in music and tlie evident prcdctoriiiincd postponement 
of reading. The notes on hingungo work sliow that tlie experimenters 


‘PiMscy, S, I/, and Prejihey, h, (J.; "Inlunluulum to tho Use of Standard 
Toata, A Uncf Mununl in tho Uho of 'IVhVs of lioth Ability and Achiovomcnt in 
the School BcUjcr.ts.’' World lhM>k <!oiii]>i\ny, Yonkers, N. Y., lf)22. 

* Scott, Lfliltv V., Pnvtt, (’arolino, and othorH; Record of GVoup VI, 1921. City 
and Country Hohool, Now ^'oik, 105 Wcot l‘2th Street, pp. 72. 
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: h^Y0'formulated with more or loss clarity a hypothesis concerning the 
' apiproaoh to reading. We can only liope that the conditions of the 
; whole: experiment may be controlled and sot with sufficient care to 
, tpst that hypothesis scientifically. 

■ There is no clear statement of objectives and the reader is loft to 
• Evaluate the “curriculum” as ho sees fit. New criteria arc necessary 
to determine the validity of now departures and those who initiate 
untried procedures should formulate tlic objectives in the light of 
whicli a true critique of their endeavors is possible. 

Those who considor the record a proposal miss the point. The 
significance of adequate records in furnishing points of reference 
and departure in experimentation is seldom realized because they are 
infrequently put into shape and published. The technique of record 
keeping needs illustration. The analysis of tlie group records and 
study of the growth of individuals is a further step of experimental 
technique, facilitated, but not exemplified by this booklet. 

L. Z. 


4 . A Manual of Suggestions on Education lielaled io Sex. *—Prepared 
under the direction of the Surgeon General of the United States Public 
Health Service in collaboration with the Bureau of Education, this 
Manual is well sponsored. There arc those whose prejudices will 
incline them to dismiss this book without a hearing. It would be 
salutary, indeed, if such persons could be induced to read the list of 60 
contributors whoso names appear in the preface, and enough of the 
iiitroduotion to enable them to realize how mucli of their social and 
civic responsibility they are side-stepping by a closed-minded adher¬ 
ence to a position which has become untenable. Because most con¬ 
scientious objectors have considered the subject too narrowly, we are 
prompted to quote this definition of the problem: 

" In Imrmony ^vith tho rapidly dovoloping iMychology of education, it becomca 
necessary to conceive of education in rolaiion to sex as but a phase of character 
education as a -wholo. As suoli, 'skc education’ moans vastly more than instruction 
concerning sox; it means a comprehensive and progressive process of caro, guidance, 
and example oxtoiuUng over a long period of years, from infancy to maturity. 
Moroovor, sox education is a social and a socialising process; both in its progress 
and in its redults it roaches far beyond tire bonnclarios of tho physical person. 
Beoauso of tho far-reaching effects of tho ovontnal attitude and practices of the 

1 Gtuenborg, Bcnjamm C., Ph. D., Editor'. “High Schools and Sex Education.” 
Government Printing Office, Washington, 1022, pp. VII + 08. $0.60, 
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individual) sox education cairioB with it obligations of the widest social importance. 
As a phase of character formation, sox education must inoludo all tho instruction 
and training that may help to form normal and wholoaomo attitudes and ideals 
in relation to sex, and to slmpo conduct in accord with such attitudea and idonla. 
Such education imiRt, tliorcfore, bo developed as an organic part of tho entire 
educational ijrogrnin, and not he considered a apccia! and isolated bit of ritual to 
bo performed at a given lime, nnd then diBiiuBscd as finished.” 

The Mniuial incorporates tho results of experimentation for the 
practical guiclanco of thousands of teachers who realize the signifi- 
canco of tlieir tasks but arc not equipped to proceed without guidance. 

Three chapters arc devoted to the general aspects of tho problem. 
Each of the succeeding seven chapters then takes up the relation of 
sex education to tho content of the course in some subject. To 
illustrate wo quote the list of topics treated in Chapter Ten, under the 
heading, The English Course: "(1) English and literature, (2) Litera¬ 
ture and life, (3) Teaching discrimination, (4) Positive ideals, 
(6) Tho negative side, (0) Literary biography, (7) English composi¬ 
tion, (8) Supplementary reading, llefcronccs.” What is narrowly 
conceived as sex education is treated in Appendix A under emorgenoy 
measures. Tiie ajjpondix also offers a suggested outline of a summer 
school course for teachers, a form for a pJiysical examination record 
card, and a classified bibliography. 

Those who realize tho psychological implications of tho problem 
must be encouraged by tlio insight revealed in the organization of the 
materials contained in this manual. 

L. Z. 


5. A Case Book for Teachers and Parents ,—This is the subtitle of a 
volume^ in which tlio author has drawn on his exporienco for a large 
number of instances illustrating the numerous problems which con¬ 
front those who deal with adolescents. The work is at once a plea 
for a more sympathotio treatment of youth and a compendium of 
practical suggestions. While the material is very loosely organized 
and the style is extremely discursive, Uie author will hardly be con¬ 
sidered at fault l)y those who arc disinclined to consult more scientific 
treatises. Tlie discussion is limited to the description and disposition 
of ‘eases’ as they arise. The evidence is not summarized to point out 


‘Stabloton, J. K.: Your i^robloms and Mine in tho Guidance of Youth. 
Public School ruljliebing Co., Bloomington, III, 1022, pp. IX +274. 
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the need for a fundainental change in educational policies and provi- 
sions. Nevertheless, the book will be useful to those who must find 
in vicarious experience first aid for pi'essing problems. 


L. Z. 


6. A Bullelin on ISducaiioml Maisuremmis. —The program of the 
Ninth Indiana University Conference on Educational Measurements 
furnishes the content of an interesting bulletin^ which contains, among 
other things, four short articles on educational provision for oxcoptional 
children by Henry H. Goddat'd and three brief papers on reading by 
■William S. Gray. Professor W. P. Book and two research workers 
at Indiana University report on the formulation and administration 
of “Group Tests for Measuring Observational Learning and Accuracy 
of Beport/' and present experimental evidence on the educability of 
the traits in question and their relation to intelligence and school 
training. 

The reader will be surprised to find Prof. W. M. Black's searching 
philosophical diseuasion of "Three Bemedial Principles in Education" 
in the same bulletin. 

L. Z. 


7. The Relation of Sjteed and Accuracy of Performance^ —^In judging 
the lengths of lines, the magnitude of weights and the quality of Imnd- 
writing, Garrett found that increasing the rate of judgment from one 
every four seconds to one in two seconds brought about a gradual 
increase in accuracy which could not be maintained under greater 
speed—one or one-half second intervals. In simple motor acts, such 
as thrusting and tracing, an inverse relation between speed and 
accuracy was apparent. The author concludes, from the experiments 
upon judgment at least, thatforeaehindividualthere is an optimum rate 
which permits most appropriate readjustment between reactions. The 
optimum interval catches the subject at the peak of the mental integra¬ 
tion. Whether the optimum rat© could be appreciably increased by 
practice was not disclosed with certainty. 

_ A. I. G. 

^ Ninth Conference on JSducaticnal Meaetiremenis, Indiana University Biillotin, 
The Extension Division of Indiana University, Bloomington Vol, VII, No. 12,1922, 
pp.140 

* Garrett, H. E.; A Study of the Uclation of Acoiiracy to Speed. Archives of 
Psjycfioioffj/, Columbia University, Now York, No. 5®, 1922, pp. 104. 
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8 . Intelligence and TonsilsJ —Using the control group technique, 
the influence of the removal of diseased tonsils in the cage of 236 chil¬ 
dren was determined by repeated examinations before and after for a 
period of a year or more. An increase in weight—slight during 
the first but greater during the second half year following the opera¬ 
tion—was the only trait in which the “tonsil" group exceeded tho con¬ 
trol group. The removal of tonsils has no effect on the growth of 
height, strength of grip, speed of tapping, or in performance in tho 
Hcaly or Slanford-Binet Intelligence Tests; at least, not within the 
year following removal. The author accounts for the improvement in 
school work in terms of increased health, vigor, and “volitional and 
emotional normality." 

A. I. G. 


9. The Effect of Mood upon Performance }—Tho student subjects, 
previous to a series of tests of mental and motor functions, indicated 
their temporary mood on a rating scale which embraced steps from the 
“least” to tliG “greatest clicorfulness over felt.” Subjective estimates 
and tests were repeated daily for a month or more. With improve¬ 
ment from practice eliminated, correlations between performance and 
cheerfulness wore computed for each subject. The coefficients clus¬ 
tered about a central tendency of approximately zero, i.e., the influence 
of mood upon achievement was scarcely appreciable. There was, 
furthermore, no substantial evidence that some individuals, much more 
than others, are functionally impaired by the “blues.” Mood, insofar 
as this investigation goes, falls in the same category with “feelings of 
fatigue,” unusual temperature and humidity, noise and other influences 
whose effect on achievement, often subjectively estimated as great, 
turns out to be insignificant when objectively determined. 

A. I. G. 


I Rogers, Margaret Cobb.; Adenoids and Diaonsed Tonsils; Their Effect on 
General Intolligenco. Archives of Psychology, Columbia Univevaity, New York, 
No. 60, 1922, PI). 70. 

* Sullivan, E. T.; Mood in Relation to Performance, Archives of Psychology, 
Columbia Univorsity, New York, No. 63,1022, pp. 71. 
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10. The Influence of IncmHve and Punishment upon Reaclwi- 
Tme.^-Measurements of the speed of motor reaction to an auditory 
stimulus were recorded under’ three conditions: (1) “Normal/^ in 
which only general directions for procedure were given; (2) “incen¬ 
tive," in which the r-ccord for the preceding tnal was tliBclosed; and 
(3) “punishment/* in which tlie subject, by means of a meclmnical 
device was given an electric shock for relatively slo^v reactions. The 
average saving due to ‘‘rneentivo" was 6 per cent; that duo to “punish¬ 
ment" 15 per cent of the “normal” time. In terms of sigma (thou¬ 
sandth of a second) the amounts of saving wore 8 and 20, respectively. 

A. I. G. 


11. Psychobgy Classics .—Professor Knight Dunlap has conceived 
a splendid idea in his plan to reprint older books or articles that have 
been of significance in the history of psychology. To this pi’oposed 
series he gives the name “Psychology Classics” and Volume I of this 
series has just appeared.^ This contains a reprint of James' article 
first published in Mind (1884) and of his chapter on tlie emotions in 
his Principles of Psychology (1893). Along with these two selections 
we have also a new and very happy translation of Lango^s Vehsr 
Omneihshewegunoen, made by Miss Istar A. Haupt from Kurclla's 
German veraion. According to the editor this is the first English 
translation of Lange’s important work. Wo thus Imvo in one handy 
volume the original statement of what is now generally known as the 
James-Lange Theory of the Emotions. The volume will bo appre¬ 
ciated by all students of the emotions and of the htsbory of psychology. 

11. P. 


12. How One Superintendent Made Use of Standard Tests .—In this 
book* we have the story of how a district superintendent made use of 
standardized tests in his schools and a very interesting story it is. 

‘Jolunaon, Albert M.: The Induoaco of Incentive and Punishment upon 
Eoaction Time. Archives of Psychology, Columbia Univoraity, Now York, No. 64, 
1022, pp, 63. 

*lAnge, C. Q. and James W.'. The Emotior^B. Psi/diolopy Chtssics, Yol. I. 
Williams and Wilkins, Baltimore, 1922, p. 138, 

‘Brooka, S. S.: “Improving^Schools by Standardised Tcsta." Iloiigliton, 
MilHin Company, 1021,_p, 2?8. 
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As the autlior himself says, *'Ifc is the story of how a corps of faithful, 
hard-working, but mostly untrained teachovs, with the aid of an 
inexperienced superintendent, put standardUccI testa and mea-sure- 
ments to practical uso throughout a school system to the considerable 
advantage of all concerned.^' The author shows us step by step just 
what he did, and not the least interesting is the ingenious way in which 
he brought his teachers to sec the need of tests and to feel that the 
results were Romething that would bo of help to them. 

Most of the tests used were educational testa, and the progress 
of pupils was measured. In the schools, as a whole, there was a ten¬ 
dency to rank decidedly above grade in arillimetic. This the author 
believes was due to the largo amount of time spent on this svrbject, 
and it was decided to cut down the time devoted to arithmetic. On 
the other hand, the reading scores were “scandalously low,” and there 
seemed to be many reasons for this. The superintendent and teachers, 
therefore, concentrated on the problem of reading. Several chapters 
in the book deal with the topic of reading and indicate what methods 
were employed to improve in this subject. And, lastly, we are told 
what was done to help tho children in their methods of study. 

The book is not a text on tests or mcasureraont or reading or 
methods of study. A casual glance at the book makes one feel that 
it is a mixture of all these things. And so it is, and for a very good 
reason. Standard tests showed certain deficiencies in the school 
system, and in this particular case the results of the tests made 
superintendent and teachers concentrate on the problems of reading 
and children's methods of study. In other school systems, tests 
will reveal otlier deficiencies and other problems for study. And 
this is the real justification for the use of tests, namely, that they lead 
on to further problems and make more efficient teachers and pupils. 
Every superintendent ought to read Mr. Brook’s book, not for the 
purpose of doing just what he did, but as a suggestion as to what 
problems can be attacked by means of educational and intelligence 
tests. R. P. 


13. A Scientific Analysis of the Varioiis Types of Silent Reading .^— 
Those who are familiar with earlier reading studies from the University 

‘ Judd, Charles llubbnvd find Buawell, Guy Thomas: Silent Rending: A Study 
of Various Types. Supplcmctiiaty Edvcalioml Monogra-phs, No. 23, Department 
of Education, University of Chicago, Chiongo, 1922, xiii + 160. 
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of Chicago laboratories will welcome the announeernGnt of another 
monograph, the seventh of an illustrious series dealing with some phase 
of that subject. The purpose of this study is to ascertain the funda¬ 
mental differences in the mental processes involved in various types of 
silent reading and to contribute sdcntific data from which the some¬ 
what obscure symptoms of success and failure may be aacertained. 
The substitution of objective evidence for iinanalyzcd assumptions 
should lead to revised conceptions of reading ability, and the practical 
implications should affect instruction in every subject in grades above 
the third. 

Ninety plates and 21 tables are presented and the reader can discern 
the behavior of the eyes of a number of selected subjects in a variety 
of silent reading activities. The matei'ials used include paragraphs of 
the Gray Oral Test, simple prose fiction and excerpts from poems of 
various degrees of difficulty, passf^es from geography, rhetoric and 
algebra texts, French prose and selections from a French grammar, 
Latin prose and selections embodying two languages. The subjects 
read and sometimes reread for various purposes, and the record.*? 
enable the reader to study the effects of changes in purpose and atten¬ 
tion, as well as adjustments necessitated by differences in content, 
difficulty and language. We quote from the monograph, ‘'Whenever 
the mental processes of pupils show fundamental differences, practical 
school procedure will have to fi.t its methods to those differences. The 
program of co-operation between science and practical teaching is 
easy to lay out when we thus aeo the intimato relation between methods 
and psychological distinctions. The duty of the scientist is to deviso 
methods of discovering and describing fundamental distinctions. 
The duty of the teacher is to develop practical ways of dealing with the 
various kinds of mental proocesses which ai’e pointed out.” 

Some sweeping assumptions of uncritical psychological thought 
are refuted: 

“There is a popular psychology which assumes that, when the 
reader sees a word, the interpretation is tied by what older psycholo¬ 
gists called an association and by what some recent writor-s have called 
a bond, to the impression received. The notion that a word and 
its meaning are two fixed pieces of experience which can be tied 
together, is a purely mechanical theory and not adequate as a basis for 
real teaching. The simple mechanical explanation is very satisfying 
to some minds because it reduces all teaching to the same formula. 
One can teach little children and big children the same way. All one 
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has to do is to set up move and more, combinations (bonds) and 
strengthen them. This type of popular psychology finds very little 
support in analytical studies such as the present monograph reports. 
A printed page turns out to be—a source of a inass of impressions 
which the active mind begins to organize and arrange with reference 
to some pattern which it is trained to work out. The business of the 
student of educational science is to discover the various ways in 
whioli impressions can bo organized. Thus the pattern or attitude 
determines the process. Menial life is a complex of organized aiiitudes 
—not a collection of mechanical assodaliona or bonds,*’ This pregnant 
sentence was hidden in a paragraph and is here given the emphasis 
warranted by its signifi.cance. Such fundamental considerations 
are presented in Chapter I. The next three chapters give the data. 
The possible reactions to difficulties of vocabulary, sentence structure 
and logic are recorded in Chapter II. Chapter III shows that the 
effects of changes in attention are not the same for all subjoots. 
Some pupils seemed to be unable to control or change the level of 
attention when asked to study hard, read carefully, skim or prepare 
to repeat verbatim. Most of the records show that effort results in a 
narrowing of the span of recognition, lengthening of the pause or 
fixation, movements of tlio eye back over the lin 0 (regrossive movements). 
Some records sliow that the subject evades difficulties in silent reading 
by passing over them without a feeling of responsibility. 

Chapter IV deals with analytical study. The plates show that 
the mental tension and the i)rocess involved are very different from 
those involved in reading, and that the grade-placement and time 
allowance for such study may be extermined experimentally, 

Chapter V shows conclusively that Latin students do not actually 
read, but fumble and puzzle out meanings in a highly ineffective 
way. The number of fixations to the line of Latin prose is astounding. 
The utter helplessness of the Latin student debarred from reference 
to his vocabulary reminds one of the futile pacings of caged animals. 
The influence of the direct method of teaching French may be noted 
when French and Latin records are compared. 

Chapter VI is full of suggestions as to practical implications and 
the necessity for the analytical determination of the bases instruc¬ 
tional policies. 

While we had hoped to find reports of the effects of various types 
of training, and thus ovidonces of the need and effect of specific 
instruction, that may not have been possible in this study. High 
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Bohool people should see the necessity and feasibility of scientific 
attack on more of their problems from this significant contribution 
to the literature of educational psychology. 


L. Z. 


14. Achievement "Noi'ins** for Subnormal Children.'^'—Tha results 
of the first survey of a selected group of subnormal children arc 
presented in a recent bulletin over the signature of Dr. Wallin. 

Various degrees of subnorraality arc represented. The tests 
used were part of the Ayres Spelling Scale and Starch Spelling Tests, 
the Gray Oral Heading Test, and the Cleveland Arithmetic Test. 
The results of individual examination by some form of the Binet 
Test are also given. The writer considers the data of value in 
determining the classification and educational treatment of sub¬ 
normal children elsewhere. 

Scores are presented in sufficient detail to permit a variety of 
careful comparisons and the histories and description of outstanding 
oases lend ooncrotonesa to the discussion. 

L. Z. 

15. The Reading Process^ —This is the title of a iiew book which 
aims to put at the disposal of teachers the fundamental considerations 
undeiiying method in the teaching of reading. Tho content of the 
book—as well as its arrangement—has been used in teacher-training 
courses. In the opening chaptei-s the discussion is concevnecl with the 
sociological and psychological aspects of reading as an elaborate form 
of language behavior. The evolution of all written language is traced, 
and tho idiosynoracies of English spelling are shown to have a bearing 
on problems of instruction. 

After a brief historical sketch of methods used in teaching beginners 
to read, there follows a concise summary of tlie leading rending investi¬ 
gations. Three chapters cover as many fields of investigation. The 
discussion of oral and silent reading gives the quantitative experimental 
evidence upon which recent tendencies are based. The future of 

* Wallin, O’. E. Wallaco; The Aohievemout of Subnormal Children in Stand¬ 
ardized Educational Tests. BuUelin No. 7, Series XX, Miami Uinvoisity, 1922, 
p. 97. 

* Smith, William A.: "The Heading Process,’* Tho Macmillan Co., Now York, 
1922, pp. XII + 287. 
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school rcaclcrs is discussed in the light of their evolution. Tho final 
chapter is devoted to a survey of standardized tests for measuring 
reading ability. At the end of each chapter there is appended a 
selected bibliograpliy. Over a hundred litlG.s arc listed and these 
represent the work of more than sixty contributors. The book con¬ 
tains numerous illuHtrntivo plalc.s and tables as well as an index. 

L. Z. 


16. A Study of the Trails of Chinese Students in Amerieci .—Most 
of us who liavc attended an American University have seen 
tho Chinese student. Wo know him to be a clean, serious, earnest, 
thorough young man or woman, enthusiastic about tho academic 
activities of the campus, little participating in tho social or athletic 
activities. Have wo realized that these Chinese students in our 
American univcr8itic.H arc going back to be leaders in Chinese life, 
and that they arc licro obtaining that training which will enable them 
to achieve that leadership mul wield it for the common good? How 
to make careful selection of tho student and plan that his stay here 
may be most profitable is, therefore, a distinct problem for those with 
authority. Dr. Chu has made a significant contribution* in determin¬ 
ing certain of tlic factors which contribute to tho success of Chinese 
students wliilc iu America; Ho has found that knowledge of English 
is correlated highly wltl\ both seholavehip (r * 0.710) and leadership 
(r « 0.679); but tliat knowledge of Chinese correlates much lower with 
scholarship (r - 0.389) and leadership (r = 0.309). Knowledge of 
English and knowledge of Chiiioso are totally iincorrelatod (r = 0.024). 
Dr. Chu says, "Since knowledge of Englisli is correlated rather high 
with both scholarship niul Icadcvaliip and since knowledge of Chinese 
is correlated rather low with both of them, it is evident that in the 
preparation of Chinese students to be sent to America more emphasis 
should be placed upon knowledge of English and loss upon the knowl¬ 
edge of Chinese if Uiohc students arc expected to do well in scholarship 
and leadership in America.” A correlation, howovor, does not neces¬ 
sarily imply any such causal relationship. It might be that the higher 
correlation of hluglish than (‘hinese obtains through a higher common 
correlation witlr a tlnrd faetc)r such as intelligonco. Dr. Chu partly 


•Chu, JoiMuiiRH V.! fJhineHO SUuloatH in Ainevicat (IviaUtiGS Aesocialcd with 
their SuoccHS, T. C. OonLrihntImw to ISducalion No. 127. 
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realizes this when he says, "it is very likely and safe to say that ijntelli- 
genco has a high correlation with both knowledge of Englisii and 
Chinese." \ 

Br. Ghu also finds that length of time spent in this countrjy has 
little relation to sobolarship, leadership or knowledge of English. 
Evidently, these are things that depend more on innate ability than 
on, continued experience. Likewise, knowledge of Chinese correlates 
low with time spent in this country indicating that the forgeiting of 
the mother tongue is a alow process. He concludes that "in order to 
acquire advanced knowledge in a short period, say, 3 or 4 years, it is 
almost imperative to send out only those students wlio have received the 
A.B. degree in China or who have had a training which will qualify 
them to enter the graduate schools of America." 

Dr, Chu met many difficulties in the statistical treatment of his 
study, involving as it did the judgment of asaooiatea scattered in many 
universities. Ho has made two signal contributions along this line. 
One is a table (p. 13) giving the sigma positions of ranks for a given 
number of people ranked. This allows comparison of judgments 
made in different places. For instance, if an individual in Hniversity 
A is ranked first out of 11 pooplo his sigma position is 1.81. If another 
individual is ranked first out of six people in University B his sigma 
position is 1.49. The other contribution is that a reliability of 0.90 
or over is found from 16 judgments of knowledge of Chinese, 16 judg¬ 
ments of the knowledge of English, 24 judgments of leadersliip or 20 
judgments of scholarship. Criticism of the unreliability of individual 
judgments has recently been often voiced, Dr, Chu has given us 
exact information cf the nuniher of independent judgments necessary to 
achieve any desired reliahiUty from judgment ratings. 

Dr. Chu has produced facts which ought to be of great importance 
in determining policies regarding the choice and preliminary education 
of Chinese students coming to this country, 

PERcrvAn M. Symonds 
University of Hawaii. 
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A SECOND APPnoXIMATION TO THE CURVE OF THE 
DISTRIBUTION OF INTELLIGENCE OF THE 
POPULATION OF THE UNITED STATES 
WITH A NOTE ON THE STANDARDIZA¬ 
TION OF THE STANFORD REVISION 
OF THE BINET-SmON SCALE 

VEUCIVAL M. BYMONDS 
Unlvonily o( Hawaii 

,, The first approximation to the curve of distribution of intolligonce 
oi the population of the United States is the IQ curve, a normal curve 
of error obtained in the standardization of tlio Stanford Revision of 
the Binet-Simon Scale. 

The method employed in this paper to obtain a second approxi¬ 
mation is roughly as follows: 

Determine tlio curves of distribution of the 0 occupation groups as 
given in Table XIV on page 63 of Volume IV of the 1910 (thirteenth) 
census of the United States, making the urea of theso proportional 
to the percentage which they arc of the total population (Table VIII, 
page 40, 1910 census), and then doterraino the cumulative curve of 
distribution by adding tho ordinates of the separate curves. 

The steps in the process more precisely are: 

1. Determine averages of medians, Qi and Qg (scores on Army 
Alpha) of tho ocoupatione in each occupation group, weighting the 
separate occupations roughly in proportion to the number in the 
group. The data for tlio medians, Qi and Qg of the various occupa¬ 
tions, I tako from the table on i)ago 276 in the artiole by Fryer entitled 
Occupation IntclUgonco Standards in School and Sociel^f Vol. XVI, 
Sept. 2,1922, page 401. 
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Adnicui/TonSj PonEBTiiY and ANi&ur. Husbandry 



, 

Median 


Weights 

^ population 

P&nucra. 


1 58 

83 

■1 

QH|H 

Eana laborow... 

13 

1 21 

47 

■■ 


FiBhetmon. 



61 

1 


Lumbermen. 

18 

35 . 

02 

1 

1 

2 

161,000 

12,060,000 out of 
12,050,000 , 

Weighted average,... 


80 

66 




The standards for farm laborers are taken to bo the same as for 
conatmotion laboiore given in Fryer's table. I have no data to allow 
me to take independent figures for farm laborers. All tlirough this 
paper laborers are given tho standards 13,21,47. 


PbcTiucnOH 0^ Minerals 

_I — .. ^ 


1 

Q. 

l 

Median 

1 

1 

Weights 

Population 

1 

Minere... 

, 40 


71 

92 


Pommen.. 

' 69 

77 


2 


Exeoutives. 

81 ^ 


137 

3 


1 


1 



964,000 out of 






966,000 

Weighted average.. 

42 

61 

74 

1 
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MANOFACrUlUMa AND MBOnAHICAL IndUSTHIBS 



Qi 

Median' 

Q. 

Weights 

Population 

Bakers. 

40 

60 1 

87 

1 , 

90,000 

Blaoksmiths. 

ao 

01 ! 

82 

2 ' 

241,000 

Masons. 

10 

40 

CO 

2 


Carpenters. 

44 

06 

88 1 

8 

817,000 

Printers. 

30 

00 

03 ' 

1 

128,000 

Eleotrioians. 

67 

81 

109 

1 

136,000 

Stationary onginoers.| 

30 

66 

81 

2 

231,000 

Firemen. 

19 

27 

63 

1 

111,000 

Foremen. 

01 

70 

111 

2 

176,000 

Laborers. 

13 

21 

47 ; 

26 

2,400,000 

Maobinists. 

40 

{ 03 

8D ' 

6 

479,000 

Minor oxooutivea. 

81 

1 100 

137 

1 

104,000 

Painters. 

38 

60 

81 1 

3 1 

334,000 

Plumbers. 

44 

1 00 

88 1 

1 

148,000 

Leathers workers. 

10 

1 30 

41 

2 

216,000 

Textile workers. 

18 

20 

00 

7 i 

661,000 

Shoemakers. 

38 

66 

70 

1 I 

70,000 

Taflors..i 

42 

06 

89 

2 

206,000 

1 

Weighted average. 1 

28 

i 

1 

42 

68 

1 

1 

6,794,000 out of 
10,069,000 


Important occupations omitted from the tabulations in this 
group are: 


DreBsmakora. 449,000 

Controotors. 174,000 

Molders. 121,000 

Milliners. 128,000 

Manufacturers. 257,000 

Sewers. 291,000 

Semi-skilled operatives. 1,676,000 


■ 2,090,000 

At first sight it looks as though the average over-neglected the con¬ 
tractors and manufacturers who are well above the group average 
intelligence. But their combined weighting would be but 4 os against 
16 for the remaining semi-skilled operatives who are probably below 
the group average in intelligence, if wo may judge by tlie two semi¬ 
skilled occvipations that havebcenincluded—leather workers and textile 
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Workers. On the whole, the weighted averages if anything, are too 
high, rather than too low. Probably tho 1920 census will give more 
weight to automobile mcolianics. 


TnANapORTATION 



Qi 

Median 

Qi 

Weights 

Population 

Teameters... 



72 

4 

408,000 

HosUeta... 

36 

66 

77 

1 

63,000 

Brakemen. 

41 


86 


03,000 

Ooitduotora. 


83 

106 

■■ 

122,000 

J'oremen. 



110 


70,000 

!LabordT8;i. 

13 

21 

47 

8 

702,000 

Bogbeera, looomotivo. 

44 


84 


00,000 

Firemen, loooinotive. 

44 

61 

84 


70,000 

Telegraph operators. 

67 

86 

110 

HI 

70,000 

Telephone operators. 



05 

1 

08,000 






1,888,000 out of 






2,038,000 

Weighted average. 

31 

40 

66 




The outstanding injustice here is the omission of chauffeurs, 
which would wield a much more important influence on the basis of 
1920 census returns. 


Trads 



Qi 

Modian 

Q. 

Weights 

j Population 

Sales clerks.. 

88 

52 

06 

■1 

1,472,000 

Laborers. 

13 

21 

47. 

■9 

183,000 

Retail dealers. 

64 

86 

110 

12 j 

1,106,000 






2 , 860,000 out of 






3,614,000 

Weighted average.... 

42 1 

03 

08 




This is almost surely too low omitting as it does bankers, insurance 
agents, and real estate agents, but I have no means of estimating the 
level of these occupations. Their combined weighting would be 3. 
The figures lor retail dealers were taken from Yerkes ('21), Vol. 15 
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of the National Academy of Sciences, 
(he Uiiited States Amig, page 825 under 


Psychological Examination in 
the heading of stock-keepers. 


PoDno Sfinvicn 



Q. 

Median 

I Q. 

Woighta 

Population 

Laborers. 

la 

21 

47 

1 

03,000 

106,000 

OiBoials. 

«i 


137 1 

1 

Folloemea. 

40 



1 

02i000 

77,000 

Soldiers and sailors. 

26 

44 

73 

1 

Weighted avorogo. 

41 

OL 

87 


307,000 out of 
^ 460,000 


The standards for soldiers and Bailors are taken as a mean 
between sailors 10, 32, r»9 nod the drafted United States forces in the 
World War 36, 60, 87. 


PnorRssioNAi, Sunvics 




Median 


Weights 

Population 

Actors. 

■ 

H 

02 



28,000 

34,000 

16,000 

62,000 

118,000 

40,000 

33,000 

130,000 

32,000 

Artists. 



Chemists.. 

110 

101 



Civil engineers. 



Clergymen. 



Dentists. 




Draftsmen. 




Musicians. 




Photographers. 

80 



Teachers. 

B B 


Physicians. 


B M 



Nurses. 


B M 

8 

82,000 





Weighted average. 



m 


1,324,000 out of 
1,004,000 

— - _ 
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Bombbtio A.im PansoNiiL Sanvicm 


Woighta 

2 

1 


■ 


There are many unsatisfactory things in, this group. Tl\o stan¬ 
dards for caterer have been given to hotel keepers and managers, 
housekeepers and stewards, lodging and boarding housekeepers, 
restaurant keepers, although the term caterer is used gonorally in a 
much more restricted sense. Important ocoupations omitted are: 


Barteodora. 

Janitors............ 

• ... 


.... 101,000 
.... 118,000 
.... 84,000 

Saioookcopors. 

. 


osiooo 

.... 188,000 

Other servants. 



.... 1,088,000 




1,042,000 


Probably the group, bartenders and saloonkeepers, will be greatly 
diminished in the 1920 census. 


Population 



1,850,000 out o! 
3,722,000 



CliBUOAl, OOOVPATIONS 

Qi Median Weights Population 


Bookkeepers. 77 101 127 6 487,000 

Shipping clerks. 64 78 102 1 31,000 

Clerks... 74 98 121 0 640,000 


Stenographers.... 73 103 ) 124 3 317,000 


1,526,000 out of 
1,737,000 

Weighted average. 74 09 122 
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Q. 

Median 

Qi 

Agrioullure, forestry, and animal husbandry. 

26 

30 

06 

Extraction of minerals. 

42 

61 

74 

Manufacturing and mcohnnicnl industries. 

28 

42 

08 

Transportation. 

31 

46 

06 


42 


08 

Public Service. 

41 


87 

Professlonol service. 

02 

118 

145 

BomestJo a>}d poraonol service. 


52 

78 

Clerical occupations. 

74 


122 


2. The second step is to find the c of those groups, tr « 0.7412 
(Qj — Qi) on the assumption that tho distribution is normal, an 
assumption which I make at this place. 



Per cent 

Median 

B 

Z 

Agriculture. 

38.2 

89 

29 

34.2 

Mining. 

2.6 

61 

24 

26.4 

Manufacturing. 

27.9 

42 

80 

87.8 

Transportation. 

6.9 

40 

26 

84.6 

Trade. 

9.6 

08 

41 

46.4 

Public service. 

1.2 

01 

34 

47.1 

Profeesions. 

4.4 

118 

89 

44.7 

Domestic. 

0.9 

62 

31 

37.0 

Clerical. 

4.0 

09 

36 

37.6 


3. Distribution with these <r arc not lepresentativo of their occupa¬ 
tion groups because they are the average a of several distributions 
with different means. To obtain a truer SD (call it S) use the formula: 

S® = <r® -f- ff’* 

dialributlon ot sverBgo distribution of medians of 

occupation of group various oooupations in group 

4. The fourth step is to find the ordinates of those distributions. 
First find tho S distances of tho points -40, -30, -20, -10, 0,10, 20,30, 
40, etc., from tho medians of these distributions. Read from Table 
II of Pearson’s tables, the ordinates corresponding to these 2 distances 
and multiply by the percentages that each distribution is of tho total 
(Table VIII, p. 40,1910 censiis, Vol. IV) and divide by S. 
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OaDiNATBfl OF Distribdtion ofOcodpation Gnoupfl IN Intblliqhnob on Scale 

OF AnMT Alpha 


Score 

oorre- 

epODding 

to Army 
Alphft 

BH 

1 

Min- 

iag 

Manu> 

fnotur* 

ins 

Trans* 

porlo- 

tlon 

Trulo 

Public 

sorvicii 

ProfcB* 

bIodbI 

group 

Do- 

nieaUo 

Clerl- 

eal 

Total 

ptvu* 

latlon 

-100 

m 


1 




1 < 


1 • * 

1 

- 90 

1 


2 

• • • 

... 


1 * 

■.. 

• 1 1 

3 

~ 80 

2 

t « 

4 

* • . 

1. 



1.. 

( » t 

1 

- 70 

0 

• » 

S 

1 

8 

, . 

t « 
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• 1 t 

10 

- 00 

16 

1 t 

18 

2 

6 

1 

• « 

3 

» • « 

44 

- 60 

as 


36 

4 


2 

$ 0 


( • • 

80 

- 40 

97 


07 

0 

19 

3 

• t 

12 

• « r 

174 

- 30 

126 

i 

110 

18 

20 

4 


mm 

B • t 

314 

- 20 

217 

3 

180 

32 

SO 


i 

mm 

1*1 

627 

- 10 

840 

7 

286 

64 

67 

8 

2 

05 

1 

828 


607 

16 

806 

82 

80 

11 

8 

00 

3 

1106 

10 

070 

28 

617 

no 

100 

14 

6 

140 

0 

1008 

20 

880 

48 

020 

lei 

188 

17 

0 

186 

12 

2014 

30 

938 

00 

711 

170 

190 

20 

14 

226 

21 

2337 

40 

070 

87 

747 

107 

184 

23 

22 

254 

34 

2818 

60 

022 

06 

732 

108 

201 

26 

31 

207 

61 

2S22 

60 

800 

SO 

C07 

184 

200 

26 

42 

262 

71 

2305 

70 

042 

73 

£66 

167 

207 

25 

69 

237 

02 

2064 

80 

478 

62 

446 

123 

106 

24 

. 00 

200 

no 

1001 

, 00 

320 

32 

326 

80 

170 

ai 

81 

167 

122 

1328 

100 

190 

17 

226 

60 

161 

18 

01 

116 

120 

1001 

110 

112 

8 

148 

30 

122 

16 

07 

78 

120 

731 

120 

50 

8 

84 

20 

06 

12 

08 

40 

107 

627 

180 

28 

1 

46 

10 

70 


06 

29 

88 

370 

140 

IS 

» • 

24 

5 

40 

0 

88 

10 

07 

268 

160 

6 


11 

2 

33 

4 

76 

8 

47 

136 

100 

2 
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1 

21 

8 

03 

4 

31 

13D 

170 

1 


2 


13 

2 

60 

2 

10 

69 

180 

, , • 


1 
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1 

37 

1 

11 

50 

too 





4 


27 



37 

200 

* 1 r 




2 


18 


a 

23 

210 

* » f 




1 


11 


1 

ID 

220 







7 



7 

230 







4 


• a « 

4 

240 





• « • 


2' 



2 

250 

• » . 






1 



1 

2QO 



• " 



•• 

1 



1 


The mean of this distribution is 60.66> the median 48.41 witii a, 
43.00. This compwes closely with the results of testing in the Army. 
On page 764 of the Army Report, I find the median Alpha score of the 
white draft, native bom, given as 68.9; for the white draft, foreign 
born 46.7, and for the North and South division of the colored draft, 
respeotively, 38.0 and 12.4. Lump last two together for a rough 
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median of the colored draft aa 20. Weight these in the ratio 68,16, and 
17, approximately tho percentage in the army of native born white, 
foreign born white, and negro, and we obtain a composite median of 
about 61. Tho close agreement of this median with my figures is 
remarkable. Very nearly the same median on the same scale is found 
for the distribution of intelligence of the whole population by two 
partially independent methods. 

There has been much speculation as to whether the Army figures 
arc truly representative. Terman (’21) criticizes them on the ground 
that the exemptions favored the superior levels. Lincoln (’22) dis¬ 
putes this, arguing that if an)d;hing the exemptions favored the lower 
intelligences and that tho Army took superior men. Tho exemptions, 
amounting to 6,973,000 out of 9,500,000 registrants, are divided as 
follows: 

Cent 


Occupational and industrial reasons. 0.8 

Religious reasons.. 0.6 

Dependency. . 61.1 

Already in military or naval service... 8.0 

Alien allogianoo. . 18.2 

Disability...i,.. 7.76 


Lincoln argues that exemption because of oeoupat^onal and indus* 
trial reasons is on tho basis of expenenoe rather than intelligence. 
Likewise the Army report seems to think that the volunteers already 
in the military or naval service were above avera^ intelligence. 
Granting that these two groups were above average (1^7 per cent), 
they are more than over-matched by tho aliens and the physically, 
mentally, and morally unfit (20.9 per cent) who are evidently below 
the average. And the large group of those having dependents—who 
can say that marriage is positively or negatively correlated with 
intelligence? The fact that the median of my determination of the 
distribution of the intelligence of the population computed on the 
basis of occupations agrees so closely with the Army median is strong 
evidence that the Army group is partial to neither the intelligent nor 
the unintelligent. 

The cumulative curve is strongly skewed to the right, notwith¬ 
standing tho fact that I made tho assumption that the occupation 
groups were normal distributions. In making this assumption, I 
eliminated the imperfection in the Alpha scale which is seen in the 
piling up of zero and extremely low scores. This skewness depends 
on nothing except that the numbers in low-intelligence occupations 
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so outweigh the numbers in high-intoUigonco occupations. As a 
matter of fact tlio true curve ia probably oven more skew than the 
one here pictured. Take, for example, the group "manufacturing and 
mechanical industries.** In that group laborers have a weight of 
25 as against the elcotricinns with a weight of 1, foremen with a weight 
of 2, and minor executives with a weight of 1. Again, in the group 
"transportation," laborers have a weight of 8 oe against foremen with 




kio. X .•>'Curvo of Iho distribution of intolUgonoo of tho populntlon of tlio United States 
on the Boaio of Alpha. 

a weight of 1, conductors with a weight of 1 and telegraph operators 
with a weight of 1. In these groups it is evident that were tho dis¬ 
tribution built up by cumulating separate occupations these, too, 
would be skewed to the right, making the total cumulation even more 
strongly skewed. 

An inspection of tho frequency curves of tho dilTcront occupation 
groups give the impression that there is a lower limit to the intelligonco 
at which a man is socially acceptable as a worker, whereas there is no 
such clearly defined upper limit. Five of tho occupation groups seem 
to disappear at about —40. Is it possible that —40 represents a 
lower limit of intelligence that is occupationally acceptable or nccos- 
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sary ? The census figures of the feeble-minded in 1910 show that about 
42,000 were classed os feeble-minded in. special institutions and alms¬ 
houses. Granting that tho real number in the country was 10 times 
42,000, they would make no appreeiablo difference in the form of the 
distribution, occupying an area no larger than the curve representing 
those engaged in public service. On the other hand there appears to 
be no such limiting upper level to intelligence. The fact that the 
professional gi’oup exists with a median so much higher than any other 
group suggests that tho variation in this direction goes just as far as 
chance permits. 

I hope that these facts will impress all who read them with the 
preponderance of numbers among the low-intelligence occupation 
groupb. Wo are so accustomed to think of the whole population in 
terms of our own acquaintances and neighbors. I remember being in 
a group which was listening to an English lady who had “seen” our 
country and had reached the stage where she could give her “impres¬ 
sions.” It was just after the New York City municipal elections and 
she was discussing tho results. “It is strange,” she said, “no one I 
have met in this country has any use for Hylan and yet he was elected 
by a large majority. Where are the people who voted for him?” 
After all, wo know only the people most like ourselves and forget that 
there arc the others. Workers in educational measurements have 
committed tho same errors that this English lady made. We have 
obtained our standards in. schools that wo find in our diatriots, in 
sohools where wo would send our own ohildren, in schools run by men 
who arc in sympathy with educational measurements. Tho result is 
an unconscious but nevertheless real selection. 

It appears that the Stanford Revision of the Simon-Binet is stand¬ 
ardized from measurements of a selected group. Terman (’16) says, 
page 62, “The method was to select a school in a community of 
average social status, a school attended by all or practically all the 
children in the district where it was located.” What is a community 
of average social status? On page 55, Terman says, “Figure 1 shows 
the distribution of mental ages for 62 adults, including the 30 business 
men and tho 32 high school pupils who were over 16 years of age.” 
It will bo noted that tho middle section of the graph represents the 
“mental ages" falling between IB and 17. So Terman bases his aver¬ 
age adult standard on business men and high school studentsl High 
school students will probably average an Army Alpha of lOO to^l20, 
and the business men probably about the same. 
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On page 286 in “The Intelb'gence of School Children*' Torman givoa 
a table of IQ’s of various vocational gronpH. I copy part of his 'table; 
I add to his tabic the median Army Alpha score of thcao groups and 
the Qi where they aro known. 


Vocational group 

Median 

Qi 

Median 

Alpha 

Q> 


1 IH 

t 

Score 


College abudonts. 

100 

104 



BuainGBBTQOQ.. .. 

102 

97 



Bxprosa employees.. 

C5 

87 1 



Motorraen and oouduotora.... 

80 

70 

83 

04 

Fiiemon and polioemcn.* 

84 

78 

00 

46 

Salesgirls... 

S6 

77 

52 

88 

Hoboes and unemployed.. 

80 

71 




On the basis of these figures, if we take tlio median intolUgence of 
the whole population to bo 48, Army Alpha, it would appear that the 
miedian IQ of the general population would bo about 80, say between 
80 and 82, os Tcrraan computes adult IQ’s. What thon is the median 
ihbntal ago of the general population? 0.82 X 16 « 13.21 And 
Lincoln says; “Those results indicate that the Army Mental Age norms 
were somewhat high and that the average man may have been slightly 
under 13 mentally.” Here thon is a reconciliation of tiie divergent 
facts that have troubled psyobologiais these 3 years. Thcro has been 
nb I error. ' The Stanford revision of tho Binet-Simon Scale has been 
standardized on selected individuals. It is true that the average 
Stanford-Binet Mental Age of the average man is 13.2 yearn, or about 
13 years'. 

This, then, is an explanation of tho discrepancy 'bctwcon the fact 
that the average mental age of tho average adult is only 13.2 while 
mental functions seem to continue to increoae until porUapa the age of 
18, as found by Brooks and others. The Stanford-Binet is imperfectly 
standardized. Ability to do half of tho tests in the H-j^ear group repre¬ 
sents “average adult” ability. Call this a mental ago 10 or whatever 
you wish. Then tho others should be sealed up accordingly. Tho 
results of this Study seem to show decidvely that the Stanford revision 
needs reatandardizing. It is not representative os it stands. One 
hundred IQ on the Stanford Bovision does not indicate tho average 
man—it representa a man considerably above the average. Probably 
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an IQ of 100 represents very nearly an Army Alpha score of 100. A 
glance at the graph, shows how far from being a representative median 
this is.’ 

There ai'e several discrepancies that may be explained in tlio 
light of this analysis. Garrison and Tippett (’22) found that the Otia 
Advanced Teat gives a higher mental Skgo than the Binet-Simon Test. 
This can well bo explained by the selection of groups which were used 
to standardize the tests, Undoubtedly Otis’s norms represent a moro 
representative selection of the general population, coming as they do 
from some 11,500 cases. As it was, Garrison and Tippett found that 
the Otis Mental Ages ran from 1 to 2 years higher than the Binet and 
averaged 17.6 months higher. Proctor (’20) gives a distribution of 
Stanford-Binet IQ’s in tlio high school as follows: 


IQ 

Ndubbr 

125 

14 

120-124 

11 

116-110 

11 

110-lU 

16 

106-100 

11 

100-104 

15 

06- 00 

16 

OO- 04 

11 

86- 80 

7 

SO- 84 

1 


113 


The median hero is 107 IQ. Compare this with the accompanying 
graph showing the dletvibution of high school pupils on the scale of 
intelligence. Proctor has 31 per cent of the pupils in his group in 
high school below the average intelligence. The actual distribution 
of first year high school pupils on the Alpha Scale shows that less than 
10 pei’ cent are below average intelligence. To be consistent with the 
facts which we have noted above, it would be more natural to assume 
that Proctor was working with a typical high school group and that 
the discrepancy lies in the standardization of the Stanford Revision. 

> A oorroboration of the above has como to my attention. lu the Manual of 
Dirootions accompanying tho Torman Group Tost of Mental Ability a table of 
the mental ago equivalonts is given on page 10. Tho score on tho group test 
corresponding to a mental ago of 16 is 136 and on tho previous page a score of 
184 is givon as tho median score of the high school junior. The average man has 
tho intcUigonco of tho overage high school junior? Preposterous I 



78 


The Journal of Educational Psydiology 


I bollevo this study gives ue considerable help in the problem 
of obtaining unseleotcd standardization. Where shall we go for the 



2,~-Curvos tbo dletribuUon of inlolligenco of occiipatloa groups on aoale of 

Army Alphiu 

average man? Tho average man has an Army Alpha score of about48. 
Tryer’a table shows that representative occupatioivs around this level 
are masons, hospital attendants, station agents, minora, teamstors, 



Fia. 3.—CutVQB of tliQ dlstrlbulloD of tbo Intdllgonco of firat yonr High School 
Btudonta and first yonr Collogo atudonla na roiatod lo tho distribution of intolligoncQ of 
the total population on the acnlo of Amy Alpha. 

riggers, boilermakers, airplane workem, factory storekeepers, horse 
shoers, salesclerks, hostlers, barbers, stationary engineers, cobblers, 
horse trainers, caterers, • bttoklayers, auto truck chauffeurs, farmers, 
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concrete workers, printers, and bakers. To obtain representative 
figures children and adults should be tested coming from social gi’oups 
of which the above listed occupations are typical, not high school 
students and business men. There are as many in the population who 
are less intelligent than the average semi-skillGd workmen in the 
occupations above as there are of those who are more intelligent. 
Every person who wishes to obtain norms or standards representative 
of tho total population can do no better than to consider carefully 
the occupation groups in which ho proposes to do his testing. There 
seems to be no better ready criterion. 

Suppose that tho correlation between the intelligence of parents 
and children was 0.60, certainly a low estimate. Then the mean of the 
intelligence of children of parents of tho professional group would be 
given by the formula 

Mi mt + r~^ {Ml ~~ mi) 

whore Mi is tho mean intolligcnco score of the children of 
fathers who are in the professional group, 
is tho moan intelligenco score of all ohildron. 
r is tho correlation between tho intelligence of parents and 
children. 

tra is tho SD of the inteUigenDC of all children, 
ffi is tho SD of the intolligoncc of all fathers. 

Afi is the mean iatolligcneo of fathers in the professional group. 
mi is the mean intolligenco of all fathers. 

We may assume that cri = <ft and Wi = ma since we are hypothetically 
measuring on some common scale of intelligence. Then on the seal® 
of Army Alpha 

iWa = wia + r{Mi — mj) 

= 61 + .50(118 - 61) 

= 84.6 

That is, there is always a rcgx'ession of the children toward the mean of 
the whole group whoso correlation is given. Undoubtedly the correla¬ 
tion is mucli higher than 0.60 for tho whole population, but this hypo¬ 
thetical example illustrates tho principle. 

But there is another point to bo considered—boys and girls in 
tho upper grades and high school represent a double selection. They 
are not only selected because of their fathers, but they are already 
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selected because ol their own capacities and their own prospootivo 
vocations. Because of this double selection I am inolinecl to believe 
that to restandardizo the Stanford Revision it will not bo enough to 


simply scale 


the tests down in the ratio of 


82 

i'od' 


The amount of selec¬ 


tion grows more‘intense in the upper levels where we are approaching 
more nearly the adult aclootion and away from the simplo parental 
selection. The regression toward the moan is greater for children low 
in the grades than for high school boys and girls wlio are beginning to 
experience a eelection on their own account. A true stanclarcUzation 
must take into consideration both the occupation group from which the 
children are selected and the amount of regression o! the mean of those 
offspring toward the mean of the race. 


In building up my distribution of the intelligence for the popula¬ 
tion I have made three assumptions which evidently depart from 
the truth. 

1. 1 aasvuno that the separate occupations have normal distribu¬ 
tions. I felt that because of the evident inequality of the units of the 
Alpha scale at its lower extromity, this method would not actually 
distort the truth more than some more exact method, The operation 
in which I used this assumption was in taking <r« 74 (Qa - Qj) which 
is only true for a normal distribution. 

2. I ossumed that the composite occupation group was a normal 
. distribution. Another method that I might have employed was to 

build up the complete distribution os a summation of separate occu¬ 
pations. Here again 1 believed that because of the inequality of tlic 
units of the Alpha scale at its lower extremity this method would not 
dirtort the truth more than some exactor method. However, as I 
showed above, if anything, the occupation groups would also skew to 
the right making the total distribution still more skewed. 

3. I assumed that tho separate occupations have the same varia¬ 
bility in applying 


2)3 -{- = + a® 

distribution of nvoroKo distribution of median of 

ooDupatloR of group various ooDupatlons ill groui) 

This formula is strictly true only when all the distributions are equal 
in variability, but it is a close enougli approximation to tako an average 
of the separate distributions when they are of different variability. 
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Conclusions 

1. A distribution of the intelligence of the population of the United 
)State8 built up on the basis of occupation-intelligence standards and 
the number in each occupation compares very closely in mean and 
variability with the distribution found in the army. 

2. The distribution is distinctly skewed to the right. 

3. There seems to be a lower limit to tho intelligence which is 
occupationally acceptable, but there seems to be no such upper limit. 

4. Tho Stanford Revision of tho Binet-Simon was standardized 
from a superior group. The median IQ of the general adult population 
is in the neighborhood of 82 IQ on the Stanford Revision. 

6. The Stanford Revision should be reatandardized so that the 
testa on the average adult level (in the 14-yoar group) may be labelled 
with a mental ago which more nearly coincides with the age at which 
mental growth stops. 

0. In the future consider the occupations represented in tho groups 
selected for the purpose of standardizing or obtaining norms. 

7. In using occupation gi'oups for standardization bear in mind 
that there is a regression of children toward tho mean of the race from 
the mean of tho fathers. 

8. This regression is greater for young children than for older 
children •where a second selection is taking place looking forward to 
their own occupations based on their capacities. 
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THE GRAPHIC RATING SCALE 

MAX FRBYD 

Univorsity of Ponusylvania 

I 

Owing to (iho immense importance of ratings in psychological 
experimentation, both pure and applied, constructive effort should 
be directed toward improving the means whereby ratings are obtained. 
For many types of payohological phenomena ratings arc the only prac¬ 
tical equivalents of objectivo mcaflurements, and this applies especi¬ 
ally though not exclusively to introspective or verbal report data. 
It is, of course, the aim of the psychological experimenter ultimately 
to be able to present his data in the form of quantitative or qualitative 
measurements objectively arrived at, but where under present condi¬ 
tions this is impossible, ho should seek the least subjective form in 
which hia data may be presented. An effective rating scale may fall 
short of that muoh-dcsirod objectivity, but in skilled hands it will 
provide measures equal in accuracy to those obtained by slipshod 
objective means. Hating scales have a wide use in'psychology, and 
their construction demands the same skill and care os the laboratory 
set-up. 

Qalton’s study of mental imagery furnishes a splendid example 
of the possibilities of a rating scale in pure psychology.'* The data 
of studies in mental imagery such as Galton's consist of verbal state¬ 
ments which are interpreted by the experimenter. Differences in 
reports on mental imagery are probably conditioned largely on the 
subject’s originality of expression. This error may be diminished if 
the subject is not required.to make a spontaneous report, but indi- 
cates which of a number of descriptive phrases furnishes the most 
accurate description of his own imagery. These phrases can be 
arranged in a scale representing gradually increasing degrees of mental 
imagery. Galton supplied tlie framework of several such scales, but 
did not make use of them in the manner here indicated. He obtained 
from a large number of individuals vci'bal accounts of vividness of 
imagery, and ranged representative accounts in a scale from those 
expressing a vivid imagery to those expressing the practical absence of 
imagery. If one is interested in obtaining quantitative measures of 
imagery, one need only attach numerical equivalents to these descrip¬ 
tive phrases, and allow the subject to indicate the number of the 
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phvaso which is the moat accurate description of his own imagery. A 
fair equivalent of an objective meoauro is thus obtained. 

Galton's Scale oe Vividness op Mental iMAciEnv from Intro- 
srECTiVE Hepouts VonNisiiED BY 100 Men 

Brilliant, distinct, never blotchy. 

First Suhoclilei —The imago once seen is perfectly clear ond bright. 

First Ociile.—I can boo my brcakfMt-tablo or any equally familiar 
thing with my mind’s eye quite as well in all particulars as I can do 
if the reality ia before me. 

First Quartile. —Fairly clear; illumination of actual scene is fairly 
represented. Well defined. Parts do not obtrude themselves, but 
attention has to be directed to different points in Buccession to call 
up the whole. 

Middlemost.—Fairly clear. Brightness probably at least from one- 
half to two-thu’ds of tho original. Definition varies very much, one 
or two objects being much more distinct than tho others, but the 
latter come out clearly if attention bo paid to them. 

Last Quariile.—J^im, certainly not comparable to tho actual scene, 
I have to think separately of tho several things on tho table to bring 
them clearly before the mind's oyc, and when I think of some things 
the others fade away in confusion. 

Last Oclile. —Dim and not comparable in brightness to the real 
scene. Badly defined with blotches of light; very incomplete; very 
little of one object ia seen at one time. 

Last Suhoetilc. —I am very rarely able to j’ccall any object whatever 
with any sort of distinctness. Very occD^ionally an object or image 
will recall itself, but even then it is more like a gonoralissed image than 
an individual one. I seem to bo almost destitute of visualizing power 
OS under control. 

Lmoesf.—My powers are zero. To my consciouanoss there is almost 
no association of memory with objective visual impressions. I recol¬ 
lect the table, but do not see it. 

In applied psychology, Downey demonstrates how a rating scale 
may be used in scoring a test which docs not yield an objective quanti¬ 
tative measurement.^ The test referred to is the Rcsistauco to Oppo¬ 
sition Test of the Will-Profile Series. In this test the subject is 
required to write his name with his eyes shut. While ho is engaged in 
this, tho tester places an obstruotion, such as a small pasteboard box, 
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in front of tbe pen-point, oxovting enough pressure so that considerable 
effort is required to continue writing. In estimating the reaction of 
the subject to this unexpected opposition, Downey makes use of the 
scale of deciles reproduced below. 


Decile Scale Fon Scokinq Ueaction in Resistance to Opposition 
Test of tub Will-piiopile Series 

10. Strong pressure against obstacle. Writingmaintainedntinitial 
level; firm, strong stroke, usually enlarged characters. No urging. 

9. Very strong counter-pressure on level, but with some sacrifice 
of form; letters blurred or telescoped; trembling or undue speed, or 
other evidence of agitation. No urging. 

8. Very rapid and energetic dodging with or without maintenance 
of form; often, increase in size of letters. Or strong pressure but 
not on level. No urging. 

7. Very deliberate but gentle counter-pressure. Or deliberate 
dodging with mild counter-pressure and little loss of form. Or holding 
examiner’s hand back with the loft hand, or protecting one's own 
hand with loft hand. No urging. 

6. Evasive reaction: Reversal of movement; shift of position; 
jumping of obstacle No urging. 

6. Very mild counter-pressure; loss of form. No urging. 

4. Strong presauvo AFTER URGING and READJUSTMENT 
with maintenance of form. 

3. Moderate pressure after urging with maintenance of form. Or 
deliberato dodging after urging. 

2. Moderate counter-pressure after urging with some attempt to 
preserve form. 

1. Feeble pressure after urging with loss of form. 

0. Absolute passivity in spite of urging. Typical remarks: "I 
can't,” "How can I when you stop me?” 

Plant describes a rating scale which is to be used in both pure and 
applied experimentation.^^ This rating scheme for conduct has for its 
practical purpose the betterment of nurses' notes in, psychiatric hos¬ 
pitals—giving the nurse an idea of what is wanted, and causing her to 
observe and state fads rather than draw general conclusions. The 
scale will also apparently be used in putting to experimental proof 
"Helmholtz’s ossortion that the law of conservation of energy does not 
apply to mental phenomena.” Ratings are made with this scale on 
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16 partitions or aspects of personality, each partition being measured 
by a scale of epproximatoly 10 phrases somewhat like the one quoted 
from Downey. Plant obtains the mean standing in those 16 partitions 
at any one comploto rating, and tmees the fluctuations of this moan 
over a period of time. Tho standard deviation is also traced. Plant 
considers the latter a valuable diagnostic sign, its size being an index 
of the patient’s mental confusion. The scale for grading attention 
is given below. 

Ppa-nt’s Scale pon Hatino Attention 

One of a Number of Such Scales for the Use of Nurses in Psychiatric 

Hospitals 

1. Stuporous. 

2. Can’t hold attention long enough to do even commonest 
things such as completely dressing self or eating a meal. 

3. Dresses self but can't hold attention long enough to do any 
partioular work. 

4. Can do only childish pieces of work. Cannot fit a picture 
puzzle of more than 16 or 20 pieces. 

5. Can do only childish pieces of work if they are now. Will do 
very long and complicated pieces of work along lines ho has been 
working on—os picture puzzles. 

6. Can sow for half an hour or so. Witli tho men—those who can 
play a game of ciicckcrs or billiards but docs nothing requiring a longer 
time. Leaves task half finished—to take up somo other task. 

7. Remains interested in a piece of work until tho end of tho day, 
bUt next morning has forgotten it or has no interest in it. 

8. Will worlw for a day, or day and a half, on a piece of work, and 
finish it, 

9. Often stops, even for days, in a task requiring a long time but 
goes back to it over and over again until it is finished. 

10. Plans and carries out a piece of work requiring a long period 
of time, as weaving a rug or making a piece of pottery. 

In these scales Plant attempts to recapitulate ontogeny in tho order 
of the sub-divisions. He says regarding this rating scheme: “On its 
face we are dealing with a facultative jwychology, long since discarded. 
That is, however, not tho caso sinco Iho terms arc social and not 
psychological.” In other words, tho moasuromenta are of objective 
rather than subjective phenomena, in terms of behavior rather than in 
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terms of introspection, but still psychological. The facultative objec¬ 
tion, liowever, could be precluded if the 10 divisions represented steps 
on a continuous line. 

Many similar examples could be cited from the field of educational 
measurements, such as the various handwriting scales. In these cases, 
liowever, the steps on the scale are based on the decisions of competent 
judges, By invoking the aid of these judges the experimenter is 
enabled to add to the accuracy of the steps on his scale. 

The above illustrate the possibilities of the use of rating schemes in 
puie and applied psychology, and indicate how they render measure¬ 
ment possible with seemingly incoherent data. 

When an objective method of measurement attempts to supplant 
a rating method, as in the case of p^chologioal tests, it is easy to lose 
sight of the fact tliat the objective measurement is no more reliable 
than the ratings which it displaces. It is important, therefore, when 
building up objective methods of measurement to take the place of 
ratings, to obtain first of all the most accurate and reliable ratings 
possible, by refining the methods whereby ratings are obtained. 

Tlie recent development of the graphic rating method warrants 
this discussion, in order that the scale may receive wider use, and its 
merits and demerits put to stricter tests. 

II 

There arc innumerable possibilities in the way of methods' of 
rating. The list wliich follows gives some motion of the diversity of 
means at hand. 

Let us assume that wo wish to mo^ure a group of men with regard 
to some phase of personality, as self-consciousness. We may obtain 
expressions of the degree of self-consciousness in the members of the 
group by instructing judges to rate them by any of the following 
methods. 

1. Have each subject rated as self-conscious, or not self-conscious. 

2. Have the subjects arranged in order from those displaying the 
greatest sclf-conscioiisness to those displaying the least self-conscious¬ 
ness, and have them rated from 1 to ». 

3. Assuming 100 per cent to represent the greatest degree of solf- 
consciousness possible in any one person, wo may have each individual 
in the group rated in terms of the degree of self-consciousness which 
he possesses. 



88 


The Jourml of Educational Psychology 


4. Hiive tho judges select the men who are outstandiugly self- 
conscious, and tlio men who are outstandingly not self-conscious. 
They may place as many men aa they choose in those two classes, or 
they may be required to place a certain number of the whole group 
in the two clasBOS, Tlicse men may bo rated os self-conscious or not 
self-conscious; or + or The rest of the group may be rated as 
average or neutral, or bo assigned tlic symbol ? 

8. Have the judges place the men in several groxips (3, 5, or 10 
are often employed), according to the amount of eelf-consciousness 
which they display. Assign terms to those various groups, os high, 
average, and low; or good, /atr, and poor; or 1, 2, 3, . . . ; a, h, c, 

. . . , etc, 

6. The judges may indicate thoir opinion of the amount of each 
man’s self-consciousness by checking one of a number of symbols, as 

follows: +1 -1; +1 -H ?-1; Y1 Y ? N Nl; 

Yy ? nN; Y representing yes and N representing no in aiiswor to 
the question; “Is this man self-oonacious?” 

7. Have each of tho judges select 5 members of tho group, one 
being extremely self-conscious, another not being self-conscious and 
the remaining 3 representing intermediate degrees of self-conscious¬ 
ness. These men should be given ratings of 10,8, Q, 4, or 2, according 
to the amount of their sclf-consciousnc^. Tho judges may then pro¬ 
ceed to rate tho remaining m^bera of the group by assigning each of 
them a number from 2 to 10 In oomparison with these 6 representative 
men, This is tho Army Rating Scalo method. 

8. Draw a straight lino to reprt^ont tho range of self-conscious¬ 
ness, and have the judges indicate each man’s self-consoiousness by 
making a cross along this line. 

9. A large number of phrases descriptive of varying degrees of 
self-consciousness may be coUeoted, and arranged in order as in the 
examples cited from Plant and Downey, These phrases may be 
numbered from 1 to 6, or from 1 to 10, depending on the number 
of phrases. The judges may indicate their rating by checking the 
phrase which corresponds mcfit dgscly to their estimate of the man’s 
self-consciousncss. 

10. The graphic rating method is a combination of the two 
preceding methods. In this case the rating is indicated by a 
check along a straight lino, under which are printed descriptive 
phrases indicative of varying degree of the trait, from one extreme 
to the other. 
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11, A method which mo-y be employed where ratings on men are 
desired in a series of traits, is to pair each of these traits with each of 
the others in the series, and to ask the rater which trait in each pair 
is the more pronounced in the person rated. The rater may thus 
consider the subject more “self-conscious” than “intelligent,” if these 
traits arc among those enumerated. As all traits will appear the same 
number of times, the degree to which a certain trait is dominant in 
tlio personality of an individual may bo deduced from the frequency 
witli which that trait is chosen when paired with other traits. The 
traits may be ranked for each person in the order of their presence in 
liim. This hardly constitutes a fair basis for comparing one person 
with another in regard to any particular trait, as the traits may retain 
the same relative value in two individuals yet all be present to a greater 
degree in one person than in the other. 

Ill 

When confronted with all these pc^sibilitios in the way of rating 
methods, wo immediately ask ourselves, which is the most effective and 
desirable? If some means wore at hand for determining without 
equivocation and by a method not based on ratings, the exact degree of 
a certain trait in all individuals, wc should bo able to compare the accu¬ 
racy of ratings made by tho various scales. But if such a method of 
objectively measuring the trait were available, wo should have no need 
for the rating scale. Furthermore, such a trait may differ so funda¬ 
mentally from those traits which are not capable of objective measure¬ 
ment (in the sense that the rater would have more cues) as to limit the 
broadness of tho conclusions to be drawn. 

Ratings are ultimate things, and the comparison of the various 
rating systems cannot be found by recourse to an external criterion. 
In tho writer's opinion there are no flawless methods of evaluating 
rating scales. The criteria which have been advanced may be divided 
roughly into those which appeal to such factors as ease of administra¬ 
tion and scoring, popularity, and so forth', and those which employ 
statistical reasoning. 

The non-statistical criteria include such items as the case with 
which one may grasp the directions for making tho rating, the time 
required for completing the rating, tho agreeableness of the rating 
task, tho simplicity of the scale, the universality of tho scale, the ease 
with which the rating may be scored, and so forth. These are impor- 
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tant cvHcvia, unless one has access to trained judges with ucUmited 
patience. 

There are approximately seven statistical criteria. One criterion, 
used in the case of ratings on intelligenco, is the comparison of rat¬ 
ings with intelligence test scores. Rugg^® used this criterion (among 
others) with the Army Rating Scale, and found negative results. This 
may of course be construed os on argument against the narrow con¬ 
ception of general intelligence admitted in tlie employment of an intel¬ 
lectual intelligence teat, as well as against the rating method. One 
conclusion to bo drawn from such results is that more attention is 
needed to the determination of the exact abilities measured by tests. 

The following from Hayes and Paterson® includes two other criteria 
of the reliability of a rating scale: ^‘The graphic rating method was 
found to be highly reliable, as shown by the close relationship between 
ratings on the same men by the same judge for different months, and 
by a close icUtionship between ratings on the same men by different 
judges,” (Hugg also uses tho latter criterion in showing the weakness 
of the Army Rating Scale.) Certain cautions arc called for when such 
criteria are employed. Should a judge’s ratings vary or remain con¬ 
stant from month to month? Under certain conditions, is it nob to 
be expected that a judge’s estimate must change from month to month 
and may not a lack of such change indicate a wcaknc.96 in tho scale? 
If a judge rates a person tho same on succesaivo occasions it may bo 
an indication that he has learned notliing new about the subject’s 
personality in the interval; that he did, and wished to avoid the detec¬ 
tion of his initial misjudgment; or that he did, and tho scale afforded 
him no means of altering his judgment. In the opinion of tho writer, 
agreement between judges is a moro valid criterion, booauso if this 
agreement exista it is an indication that the scale ealU attention to 
universally noted oliaractoristics and makes thcjn tho basis for tho rat¬ 
ing; provided the judges have had equality of opportunity to judge the 
subject, and no judges are includod toward whom tho subject would 
display a peculiar attitude. Tho data from which Hayes and Paterson 
draw theh’ conclusions consist of ratings by foremen on their workmen. 
We should expect these judges to agree in their ratings of any of their 
subordinates, but a different story might bo told if those subordinates 
were rated by their wives and their follow-workers. Wo should expect 
agreement among raters who bear the same social and industrial rela¬ 
tionship to the subjects, and this agreement may be considored a valid 
criterion of the worth of a rating scale. 
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Fourth and fifth criteria rolate to tho form of the distribution of 
ratings—its normality and its spread. As to the normality of the dis¬ 
tribution, wo have no notion of the true distribution of the trait, 
nor could we assume that because the distribution of ratings corre¬ 
sponded to the true distribution of the trait, the ratings were correct. 
We may generalize in the statement that with a large number of cases 
a spotty distribution indicates that certain portions of the scale are 
being neglected by raters, or that the stops on the scale are not of equal 
value. This matter of equalizing the steps on the rating scale involves 
considerable labor, and ia usually omitted except for very refined work. 
If the acorea on the rating scales aie tranamuted into ranks, nothing 
is lost by inequalities of stops on the rating scale. Spread of distribu¬ 
tion is an important factor, since wo must have sufficient discrimination 
between abilities in order to figure correlation coefficients and to dis^ 
tinguish between one man’s ability and another’s. Too great a spread, 
however, adds to the error of any single rating. 

Thorndike*® and Rugg'^ call attention to a constant error in ratings, 
namely, the tendency for the judge to be influenced in his ratings on 
the specific traits of an individual by a general attitude or set toward 
that individual. If the judge likcfi the person, he rates him high in 
everything; if he dislikes him, he rates him low in everythihg. The 
error due to the formation of a *‘halo” about a person is evidenced 
when high correlations are found between ratings on unrelated traits. 
Thorndike reports finding correlations of 4-0.68 between intelligence 
and leadership, 4~0.61 between intelligence and physique, and 4-0.64 
between intelligence and character, when one judge rated 137 aviation 
cadets on the Army Scale. He also reports an average correlation of 
-1-0.67 between general ability for officer work and so highly specialized 
a quality as flying ability, when the same men were rated by eight 
judges. The average rater will not rid himself of this bias and exer¬ 
cise his analytical powers, unless* 6110 scale itself aids him to do so, and 
the absence of halo may be considered a sixth criterion of the efficiency 
of a rating scale. With tho same judges rating the same subjects, that 
scale which shows the highest eon'elation between obviously um'elatcd 
traits may bo considered tho weakest. 

A seventh criterion, one mentioned by Plant, is as follows: Present 
a person witli a list of his acquaintances, and with a set of ratings on 
one of tho men in tho list. Ask him to indicate which of the men is 
the one to whom tho ratings apply. 
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IV 

The graphic rating method, which was originated iii the Scott Co. 
Laboratory in 1920, is the latest development in rating methods and 
promises to he the most popular. Its only original feature is the com¬ 
bination of tho methods of rating on n line and by checking descriptive 
terms, both of which were in prior oxistoncc. Several graphio rating 
scales are now being used by tho Scott Company and by tho Bureau of 
Personnel Research at Carnegie InHlilute of Toclmology. Among 
Termon’s materials for the study of gifted children may bo found some 
modifications of tho graphic rating method.*'* 

The following directions and illustrative items are taken from a 
scale developed by the writer at Carnegie Institute of Technology.* 

GRAPHIC HATING OP... 

InsIrucUanB (or Uslof tb« RAtlng Seala 

1. Lot tlicao raUnea roprosont your own Judgmoola. Plouo do not oonnult noyocc in maklog 
ibom. 

2. In rating this person on a parllotilar (rail, dlaregArd orory etlisr Unit but that eno, Many 
ratings aro rsndered valusicas boosuse tho rator nilowa hlmiolf to bo Snfluoocod by a gonnral 
favorable or unfavorablo ImprcMlon whioh bo has formed of tho person. 

Si When you have satisfied yourself on tho stsndlag of this person io tho Itntt on whioh you ore 
rating liltn, pinoo a ehenk at tho appropriate point on tho horitontal lino. You do not ha/o to 
plaeo your ohoek diroolly abovo a deoerlptivo phrase. You may piace your ebeeV at iny yoiot 
00 tho Una. 

3. Docs ho appear nent or siovcnly In his drtiwf 


Bxtroincly neat Appropriately ond Inconaploiioiw Soinowlint crirolcsa Very slovenly 

and olean. Almost neatly dressed In dross in Ills dross and unkompt 

a dudo. 

9, How does lio impress people by his physiduo and boaH»g? 


Looked down on Unimpressive phyririuo Nolleonblo for good KxcUim sdmlrnUon. 

and boarlag physique And hoaring Vrry imprcaalvo 

13, How floslblcis ho? 


Hidebound. Slow to tako up Progreaeivo Qiuak to piok 

Runs in a nit now ideas tcndonoicB up new ways 

and habilH 


la alwnya odnpting 
himself and faking 
up new ideas 


18. Is ho quiet or talkaliro? 


Talks Boldom. Docs not uphold his Moderntely Aloro than upholds Gicnt tnlkor. 
When questioned ond of tho con' talhallvo liis ond of the Always going 

answors brioQy vorsatlon ooiivorsallon 


1 This scab will bo given in fulHii ft fortUcoming monograph by tlu) writer.^ 
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This Rating Scale afforded ratings on 20 suoli traits. In 10 of the 
20 traits the "good^' end of the scale was at the left. The 20 ratings 
on any individual may be made in about 10 minutes. 

To score the ratings a stencil was used, one of whose edges was 
marked off for a distance equal to the length of the graphic rating line. 
This space was divided off into 20 consecutively numbered spaces of 
equal length. Tlie stencil was placed beneath the line on which the 
rating was made, so that this line coincided with the marked off space 
on the stencil. The score was the number of the space over which the 
check was made. An X-shaped check was scored at the intersection of 
the two lines; a V-shaped check at the point of the V. If more than one 
check was made on the same scale for the same subject, indicating 
doubt on the part of the rater, the average of the ratings was taken. 
No total score was obtained, as the scale was not used in any study 
requiring this step. 

The scales in use by the Scott Co. differ from this one in several 
rospsets. In their scales the “good" end is always at the left; scores 
range from 1 to 10 instead of 20; and a total score is obtained on a 
number of traits. In order to correct for the tendency for some raters 
to rate too high and others too low, a distribution table of total raUngs 
is made for oacli rater. The ratings of different judges are equalized 
by assigning the same numericalto the upper 10 per cent of 
each judge’s distribution, the next 20 per cent, the middle 40 per cent, 
the next 20 per cent, and the lowest 10 per cent. Each subject's final 
rating, then, represents not his raw score in the scale, hut his standing 
in terms of each judge’.s standards of ratings. 

V 

What advantages does this method of rating have over other methods? 

The two basic features of the graphic rating.method, according to 
Hayes and Paterson, are the following: 

“1. The rater is freed from direct quantitative terms in judging 
men." 

“2. The rater can make as fine a discrimination of merit as he 
chooses." 

According to those writers, the scale is "simple, .self-explanatory, 
concrete and definite." 

Scott asserts that people like to use this sort of scale.^ Any scale 
wliicli makes the rating task interesting is of advantage when rating 
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scales are to bo returned by mail, or when there is no good incentive for 
thoir use by the judgeR. 

The graphic rating method haa the following general advantages: 

It is simple and easily grasped. 

It is interesting and req^uires little motivation of tho rater. 

It is quickly filled out. 

It is simply and easily scored. 

It frees the rater from direct quantitativo terms, 

It enables the rater, nevertheless, to make his discrimination as fine 
as he cares, although this discrimination is lost if a scoring stencil of 
only a few points is used. 

The deaoriptivc terms aid the rater in that they make the various 
degrees of the trait more concrete. Manyscales call for ratings on such 
qualities as "neatness/^ giving merely a definition of this word. 

It is universal; that is, no master scale is required as in the Army 
Seale. When a group is rated by several judges, corrections may be 
made for varying standards of the judges by the Scott Co. method. 

The fineness of the scoring method may be altered at will, yielding 
scores of from 1 to 6, or from 1 to 100. 

It allows of comparable ratings without requiring eaoh rater 
to know all tho members of the group. 

Tho scale yields a ldose relationship between ratings on the same 
men by the same judge for different months."® Tlic data for this con¬ 
clusion are presented in one of tho Scott Co. bulletinB, and consist of 
three sets of ratings on a number of workmen by nine foremen, at 
intervals of a month. Tho average correlation of ratings for tho first 
month with those for tho second month is +0.76 (tho lowest is +0.62). 
The average correlation for the second and third months is +0.87 (tho 
lowest is +0.66). 

The scale yields a "close relationship between ratings on the same 
men by different judges."* The Scott Co, bulletins give the correla¬ 
tions obtained between the ratings of pairs of foremen on their workers. 
The average correlation between seven pairs of foremen is +0.71. 
These were the first of a series of monthly ratings of their workers. 

Tho correlation coefficiente given in the above two paragraphs were 
obtained with total ratings on seven traits. No indication is given by 
the authors of the degree of relationship between ratings on spooifio 
traits. 

, The form of tho distributions of ratings will vary with the construc¬ 
tion of the scale, as well as with-the true distribution of the trait 
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ftinoug tho persons rnlccl. "With poorly constructed scales the checks 
are for the moat part made tlireetiy above tlio descriptive phroBes. 
Figure 1 shows the distribution of 100 self-ratings by college students 
on trait 4 of tho writer's scale. (The mannei' in which these figures 



Tio, ^BintribcUon oC KoU-TuUnga by oro hundrod GoHoko etudonla in Trait 3'of'tlio 

Qmphlo RaUng Soaio, 

were obtained will bo described later,) This scale is defective in that 
tho central phrase docs not describe the average amount of the trait. 
In Fig, 2 the dietrihvition shows greater uproad, but is spotty due to tho 


40 



]?ia. 2.—•Dlatribullon of soU-rnlingB by ono hundred CoUogo etudonts la Trait 4 of tbo 

Grapbld Roifng Soalo. 

fact that ratings were made for the most part directly above phrases. 
In Fig. 3 tho tendency to check above phrases is diminished and there is 
a greater spread in tho distribution. Tii^o ratings wore all mado under 
the same oircumatances and with tho use of tho same directions. 
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The difference between the distribution in Fig. 2 and that in Fig. 3 can 
only be explained by the assumption that the scale for trait 16 is a 
superior scale. The Scott Go. bulletins report symmetrical cliatvibu- 
tions of ratings, with the use of 10 scores instead of 20. 

With regard to the elimination of halo, little can be said that is 
definite. Theoretically, we should expect a diminution of the halo 
with the use of the scale herein described, since the directions clearly 
explain this tendency and warn gainst its presence; since both 
extremes of a scale may represent undesirable qualities; and since the 
items are alternated so that a motor tendency to check at one side of 
the sheet is seemingly eliminated. The traits, furthermore, arc not 
described only in terms of their desirable extreme. 
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Fra. 3.—Distribution of eott-rAUnss by ono hundred Collogo students in Tmll IQ of tlio 

Qrnpliio RsUng 8cn!o. 

Some data which the writer has collected may throw a little light 
on the interrelationship of abilities as rated on the graphic rating 
scale, although the manner in which the data have been developed 
admittedly leaves some doubt as to the finality of the conclusions to be 
drawn from them. They are not directly comparable with the data 
presented by Thorndike to show the presence of halo. 

Each of 100 college students was asked to rate himself on the scale 
presented in this article, by considering himself from the standpoint of 
on impartial observer. (These are the men some of whoso self-ratings 
are shown in Figs. 1, 2, and 3.) Similar scales were then sent to five 
af the acquaintances of each of these men, but with rare exceptions, 
no two men were rated by the same judge. All self-ratings and ratings 
of others were made anonymously. All individuals were excluded from 
further study on whom less than two such ratings by acquaintances woro 
available. There remained 84 students all of whom had rated them¬ 
selves, and who had been rated on tlic same scale by from two to five 
acquaintances. The ratings of acquaintances for each of the students 
were then averaged. This average rating on each trait for each student 
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was ihon averaged with the Btudent's own rating on each trait. The 
result was a series of 20 measures of each student, in which his self¬ 
rating and the average rating of his acquaintances were given equal 
weight. Those final average ratings were intercorrclated, resulting 
ill 190 correlation eoefficients (Table I). 

These intorcorrclations were all computed by the fourfold table 
method (metliod of unlikessigned pairs). Purtherinore, in obtaining 
those correlation coofficionts, one extreme of the scale was considered 
the “good" end by an arbitrary decision. The “good" name was 
given to the trait, and a rating at the good end was considered high, 
and at the bad end, low. This necessitated in half the cases reversing 
tlie scores obtained by the scoring stencil, since the values on this 
.stencil ranged in size from left to right only, The effect is as if the good 
end of the scales were always at the right margin instead of being 
alternated, and ns if the trait were described by the extreme at the right 
margin. 

If the tendency to form a halo about a person were common to 
all Judges, wo should expect the judges to rate people on the same 
dead level in all traits. A person would thus be rated consistently 
high, consistently low, or consistently mediocre, and tin's situation 
would not be modified if averages of various ratings on a person were 
used instead of the ratings of only one judge: The avei’agos would 
nevertheless be consistent. The result would be an exaggerated inter- 
correlation of various traits os measured by ratings, provided the 
favorable extreme of the trait were always considered high. This is 
tho phenomenon to which Thorndike has referred. Our procedure is 
postulated on tho phenomenon of halo being as probable in self- 
ratings as in the ratings of others. 

The intercorrelations thus obtained are given in. Table I, and 
Table II shows the frequency distribution of these correlations. High 
coefficients are the exception. The highest is between flexibility 
(adaptibility) and quickness in work (-j-0.66}, perhaps higher than 
we should expect. The other positive correlations of 0.61 or more 
are between quickness in work and present-raindedness, good-nature 
and even-temper, cool-headedness and cautiousness, freedom from self- 
conBciousncBS and good-bearing, freedom from self-consciousness and 
sociability with the other sox, sociability and sociability with the otbei* 
sox, sociability and talkativeness, open-heartedness and talkativeness, 
sociability with the other sex and talkativeness. There is a correla¬ 
tion of —0.64 between appreciation - of others and talkativeness. 
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With one or two possible exceptionSj these results do not exceed 
expectation, 

Insofar as these figures are concerned, it may be safely ventured 
that the graphic rating scale tends to eliminate halo. 

Tablb IL— Distuidution or CoanBLATiON CoKmciBNTa Given in Table I 


Amount op 

COHmSkATtON 

C'OKP/IOIHMV 

FniajUB ; 

-. 61-.00 

1 

. 41-.60 

0 

. 31-.40 

3 

. 21 -. 80 

6 

. 11-.20 

20 

. 01 -. 10 

10 

.00 

g 

. 01 -. 10 

28 

. 11-.20 

42 

. 21-.30 

32 

. 31-.40 

26 

. 41-.60 

7 

. 61-.00 

11 

+ . 01-.70 

1 


VI 

There are certain rules, based on experience, which it is well to 
follow in constructing a graphic rating scale. The following is a fairly 
complete list of points to be considered in making a scale. 

Define the trait on which you wish ratings. It will often be found 
that what one considered a single trait is in reality composed of a 
number of well-defined separate traits. Define the trait in terms of 
what the individual actually docs, for the more concretely the trait is 
expressed, the greater is the expectation that raters will agree. 

Decide on the extremes of the trait. It is frequently the case that 
one extreme of a scale may have several opposites. 

It win be found good practice to introduce every scale with a 
question, to which the rating furnishes the answer; for instance, the 
question “How tactful is ho?” or “Is he tactful or tactless?” may be 
answered by checking on the rating line. 

The rating line should be of such a length that a stencil for scoring 
the rating can easily bo calibrated. 

There should be no breaks or divisions in the line. 

The line should not bo much more than five inches in length, so 
that it may bo grasped as a unit. 
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Thci-e should not bo mom than live dcscripfcivo items nor less than, 
three. 

Tlie end phrases of the .sealo should not bo so extremely worded 
as never to bo employed. 

The phrase descriptive of the neutral or average degree of the trait 
should be in the center of the scale. 

If there are five items, the intermediate ones should bo closer in 
meaning to the center one than to the extremes. This has tho effect 
of spreading tho distribution. Tho same end may bo accomplished 
by making the intervals on the scoring stencil smaller i!i the center 
than at the ends. 

Only universally understood phrases sliould ))g used, Slang is 
effective if there is no doubt as to its meaning. 

Such terms os aucraoe, very, extremely, excellent, flood, /ai!r, or poor, 
should be used sparingly. Use in their place adjectives which in 
themselves express tho varying degrees of the trait. In place of 
extremely neat one might say fastidious, or in place of very careless in 
dress one might say slovenly, 

Tho descriptive phrases should bo short and to tho point. 

These phrases should be set in small type, and there should be 
plenty of white space between them. 

The favorable extremes of tho scales should be alternated so as to 
do away with a motor tendency to cheek at one margin of tho page. 

The construction of stencils for scoring the scales will depend on 
the purposes for wliich these scores are to bo used. If correlations 
with other variables arc desired, tho seme stencil may bo used, reading 
from left to right. If a total score is desired on any one individual, 
two or more stencils will be necessary; one reading from left to right, 
one from right to left, and one perhaps reading from tho center in 
both directions, in case it has been found that the central phrase 
describes the most favorable degree of tho trait for any special voca¬ 
tional purpose. There aro innumerable possibilities in tho way of 
handling the data. 

, Rugg voices some cautions which must bo observed in using the 
Army type of scale, two of which may apply to any sort of scale*. (1) 
That the final rating be an average of three independent ratings, 
(2) made by men who aro thoroughly acquainted with tho persons to 
be rated. Raters should bo selected with tho same care os a jury, 
and those who are subject to prejudice or bias, or who have numerous 
complexes, should be eliminated. In judging tho character of a person 
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it is important to know that others are biased against him, but it is 
more important to know the cause of that bias, and wo cannot depend 
upon the biased raters for this information. The true reasons for dis- 
lilco are often hidden by rationalizations. 

VII 

Graphic rating sealcs may find use in a variety of psychological 
studies. The following arc some of the uses to which they may be put. 

As criteria in evaluating tests of personality. 

As criteria in evaluating vocational tests. 

For measuring test responses. 

For rating applicants for positions on traits which are at present 
impossible of objective measurement. 

For rating improvement in an employee. 

For vocational guidance. 

For rating oUnical cases. , 

For rating ohildrou on deportment. 

For measuring the effect of drugs and other variables on efficiency. 

For any psyohological oxporimentation involving verbal reports. 

The uso of the scale will necessitate definite concrete thinking on 
the problem, and will aid in analysis end the avoidance of snap 
jiidgraonts. 
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THE UNRELIABILITY OE THE DIFFERENCE BE¬ 
TWEEN INTELLIGENCE AND EDUCATIONAL 
RATINGS 

J. CROSBY CHAPMAN 
Dopartmont of Bklucation, Yale University 

It is tho irony of fate that those who, five years ago, were attempting 
to ovcrcomo tho scoptioisin of praotioal schoolmen with reference to 
mental and educational tests should within so short a period have to 
warn the erstwhilo sceptics that they are now putting too much faith 
in the instruments which such a short while back wore the objects of 
their distrust. With tlio entrance of ^'intelligenee tests" and “school 
tests,” it was a groat temptation to “measure” “intelligence” and 
“school achievomont,” and then by the difference in standing to 
estimate the extent to which an individual was taking advantage of 
his school opportunity. The general idea is so attractive and the 
results, if true, so useful that schoolmen have been captivated by the 
simplicity of a definite figure which promised to give suoh valuable 
information with regard to the pupil and the school. Provided eufii- 
oiently accurate differential instruments aro available, no one doubts 
that the procedure is most useful, but in the absence of such instru¬ 
ments I havo been much shocked by the rigid manner in which the 
differences in iiitelligonco level and school level, resulting from single 
tests of each, have been interpreted. The single measure of intelli¬ 
gence and the single measure of school achievement have both been 
treated as though they were the ratings of two isolated traits made by 
an hypothetical but infinitely wise judge. This work promises to 
spread so rapidly that it seems advisable to issue certain caveats which 
are tho result of an examination of its logical and statistical basis. 

While lip service is given to more exacting definitions of intelli- • 
genco, most of the devices in common use make no pretence at measur¬ 
ing rate of learning in a controlled situation and under specific time 
conditions, but are content to assume that the amount learned, or 
facility acquired from the experiences afforded by a supposedly 
fairly uniform environment gives the most reliable clue to intelli¬ 
gence. Accepting fcliis idea, that tho extent of the benefit derived 
from post oxperiencQ will be used the test of intelligence, where can 
wo find for any grade a more uniform environment than that pro¬ 
vided by the school? Therefore it follows that the achievement in 
standard school tests by its very nature must be a fairly satisfactory 
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index of intelligence. The paychologiat sits ia his laboratory and 
clevises "Intelligence Tests” and his friend, the educator, dropping in, 
remarks on the excellence of a considerable part of his material as a 
test of school achievement. The following day the educator sits in his 
schoolroom, constructing his “School A-chicvcmcut Tests” and his 
friend the psychologist later comments on the suitability of large 
portions of his material as measures of what ho considora to be intelli¬ 
gence. In spile of this amusing agreement, the two tests emerge, one 
definitely labelled “Inteltigonoe Test” and tho other equally definitely 
labelled “School Products Test.” Under tho assurance afforded by 
these definite labels, their partial similar nature and common origin 
are soon forgotten. Without any compunction “Intelligence Ratings” 
and “Sohool Achievement Ratings” are treated 0.8 mcftsuTCB of 
the raw material involved and the quality of tho finished product 
respeotivelyl To a certain degitje the two testa meafluro the same 
traits and whatever traits they measure they aro themselves unreliable, 
as is shown by the repetition of each using identical forms, yet tho 
difference in achievement in the tests is made tho basis of a Mental- 
ocluoational-differential Index or of a Mental-cducational-achievement 
quotient. The injustice of naany decisions made on such assumptions 
has caused mo to calculate theoretically for a population made up of 
a single grade the reliability which can be placed in a measure which 
•is the (lifferenco between achievement in tho so-called intolligonco test, 
and achievement in the BO-called school products test. Tho remainder 
of the paper confines itself to tho derivation of tho general formula 
and the application of this formula to several proldems. It may be 
said that the results indicate how extremciy sceptical wo must bo of our 
present practices. 

Suppose two standard intolligonco tests, such as any two of tho fol¬ 
lowing—National, Otis, Presscy, Tcrinnn, etc., etc,—and two standard 
classroom products survey teste, such ns the Lippincott-Chapman 
Test or the Illinois Examination or any other similar combinations are 
given to a group. Lot the two inteUigonco tests be labelled Ii and Is 
and the two school battery tests S» and ^84 and lot tho scores of each 
individual in these four tests when expressed as deviations from the 
means be xi, Xs, xa. Then the reliability of the measure of the 
so-called differential achievement in inteUigonco and school work is 
dependent on tho degree of tho correlation between the differences 
when found with one sot of measures say h and S 9 and tho difforonco 
when found by a second sot (supposed equally acceptable) It and Si. 
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Returning to the symbolic notation, tho trust that we can put in the 
procedure is dependent on the dogroo of correlation between 

-ana- 

ffi ffi ffa 0^4 

Let tho symbolic representation of the correlation bo 
and let correlation between 

li and li = Tii 
Sa and Si = rai etc. etc. 


Then 


Xl _ 

ffi a$/ ffi/ 
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whence 


SxiXi . , Srci® 

== nriai etc., also —= n 

(Tiffa (rr 




wria d- ? m‘3 4 ~~ — nru 


m 


(n — 2ni‘i3 + n)^(n — 2nr24 + n)^‘^ 

— ^ ^34 — >*88 ~ yi4 

2 a - r»)“(l - 
Formula X reduces to simpler terms if wc make the assumption which 
is in general accordance with tho fact that 


ri3 - ri4 = r2a = rai 

In this case tho equation becomes 

ri 8 + r34 

— ~ 

1 - ri3 

Rewriting this putting 1 = intelligence Ji 

2 = intelligence 1 2 

3 — school test Sa 

4 » school test ^4 
Tim + rjiaA *“ r/154 

“2 


r, 


i — J‘/1S4 



106 The Journal of JCducational pHychology 


or neglecting subscripts 

Til -h Tfifi 
.2 

' I-l-TT' 


(y) 


Applying Formula X to certain problems, we select first some data 
in my possession obtained from a group of 208 imsclcotocl Grade VII 
pupils. 

Correlations 

National Intelligence and Pressoy Intelligence rja “ .48 
National Intelligonoo and School Product Tost^ria » .61 » 

PrcBsey Intelligence and Sohool Product Test ra « .64 » r 2 < 
School Product Test and School Product To.st tu «= .76 (assumed) 

^ 1 .48+ .75 -.64 -.01 

For taese data. =* 2 (1 — .6i)^(l — .64)^ 

- .094 

Tating similar data from two studies by Gates^ which although issuing 
from selected groups, shows the unreliability of the differential 
procedure. 

Average correlation for each grade of Grades VI, VII, VIII. 

Otis Intelligence and National Intolligonco ria = .60 

National Inteliigonco and Thorndike-McCall Heading ria » .46 « ru 
Otis Inteliigonco and Thorndike-McCall Reading *= .67 « rj 4 

Thorndike-McCall and Tiiorndike-McCall Rending rgi « .69 
whence 

“ ■164 

For another group of tests, taking again average correlations for 
several grades 


Illinois Intelligence and Otis Intelligence rn *= .54 

Illinois Intelligence and Thorndike-McCall Reading rn - .53 *= 
Otis Intelligence and Thorndike-McCall Reading rsa == .61 * r 24 
Thorndike-McCall and Thorndike-McGall Reading r ,4 = .57 
whence 


*The Sohool Product Test wfia tho Lippinoott Clnsaroom l^roduota Summary 
Teat consisting of four parta. 

* Study of Reading Tests, Journal Educational Peychology, October, 1921. 
Correlation of Aohiovomont In Sohool Subjoota with tlio Intolligonco Tests and 
other Variables. Journal Educational Psychology, April, 1022. 
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The low value of in both coses proves how unreliable is any ver¬ 
dict within a grade group which involves intelligence measured by a 
single test and reading ability also measured by a single test. 

The standard error (root moan square error) made in estimating 

Xj _ a:.i aca _ Xt ^ relationship 

ffl Oi ffA ‘ 

= j- ^ 

ffl ffn /*• ej 

\*1 \*1 91/ 91 ti 

ai 

is given below. 

2 _ *• I "t 2 V 

^ ^ _ ?*( ~ —\ j 

ffl ffl'ffl ffl ffl •»' \ffi »)/ \irj ~ «l)' 

Substituting for the three problems considered 

Problem I. Standard error = ^^(1 - .009)^^ 

Problem II. Standard error =* - .028)^ 

Problem III. Standard error = o-^, ^^(1 - .001)^ 


From these three values of the standard error it is apparent that the 
difTeienco in standing in a single test of intelligenco and a single school 
achievement tost gives almost no basis of prediction within a typical 
grade group of what the difference will be when two other similar 
tests designed to measure the same factors are employed. 

Assunfiing that the intelligence test, even though unreliable, is a 
perfectly valid intclligoncn mcosurC; and that tlio school battery test, 
even though unroliablo, is ogain a perfectly valid school measure, 
Formula Y provides uh with a means whereby we can calculate the 
accuracy with which the intelligence test and the school test must 
function, if we arc to make predictions of the differential educational 
index with a degree of probability which a correlation coefficient 
rdjrfj = .76 allows. This, be it noted, is still an insecure basis of 
prediction. 

We will assume, as is reasonable, that the true correlation of the 
ideal intelligence test and the ideal school test is .7 
Then substituting in foi'inula 

j'/j 4* J'fls __ tj 

2 _ _ 

1 - .7 


.75 = 
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whence r,, = 1,86 

or if r„ = Tss 
then r„ =■ rjj = ,93 

Within a population of a grade the various intelligence tests among 
themselves and the various school tests among themselves certainly 
cannot be relied upon to eorrelato better than ,7, Using Brown’s 
formula we may therefore calculate, on the above assumption that oacli 
are perfectly valid measures, how often wo must repeat the measure 
of intelligence and the measure of school achievement to got the 
necessary correlation of .03. 

Let n = number of times the testa must be repented, tlieu 
whence « = 6 (approx,) 

Such facts as are presented above inmt be recognized by those who 
propose to determine the difference within a single grade, of intellectual 
and school achievements when measured by such instruments as are 
at present available. The psychologist must restrain the ardor of the 
commercial houses when making fanciful claims tor the eflicioncy of 
their tests. ^ These claims may bo sanctioned in the realms of commer¬ 
cial advertising but they can never bo justified at the more exactina 
bar of statistical truth. ^ 



IS IT NECESSARY TO WEIGHT EXERCISES IN 
STANDARD TESTS? 

KARL R. DOUGLASS 
Professor of Iklucatlon, Univoraity of Oregon 

AND 

PETER L. SPENCER 
University High School, University of Oregon 

The Problem .—TJio purpose of this article is to raise the question of 
whether it is necessary to weight tho separate exercises or questions 
that go to make up a standard test and to give some statistical data 
bearing on the question raised. 

Test and scale papers are usually scored in one of three ways. 
When scales are used tho most difficult exercise completed correctly 
in the case of difficulty scales, or the sample highest in quality in the 
ease of quality scales is taken aa the meaeiire of the specimen being 
scored. Among those of this type of measuring instrument are the 
Starch scales for English grammar and for arithmetio, the Thorndike 
Drawing scale, tho Ayres and the Thorndike scales for measuring 
handwriting, the Willing, tho Harvard^Newton and the Hillegas scales 
for measuring tho quality of composition. These two types of measur* 
ing instruments are usually referred to as (1) difficulty or power scales 
and (2) scales for quality. 

In scoring papers in standard tests a cumulative method of scoring 
is used. Tests with respect to method of scoring are of two classes 
according to whether tho unit exercises or questions making up the 
test aro weighted or not. Some tests employ the simple method of 
counting each unit oxorciae or question as of possessing unitary value. 
This is true of practically all tests where speed rather than power is 
the thing measured and of all tests where the problems or exercises 
are selected so as to be of equal difficulty. Of this type are the Courtis 
Researoh Tests, the Courtis Supervisory Tests, tho Monroe Standard 
Tests for Algebra, the Monroe Diagnostic Tests for Arithmetic, the 
Handschin Tests for French and for Spanish. 

Many standard tests are made up of exercises, problems or ques¬ 
tions of unequal difficulty and value in order that adequate opportunity 
for diagnosis and for measurement of various degrees of power might 
be provided, or in some cases, where it was found impraotioal to obtain, 
exercises, problems or questions of equal difficulty and requiring equal 
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amounts of time for completion. Of this class of tests ate the Monroe 
Standardized Silent Reading Tests, the Stone Reasoning Tests, the 
Kansas Silent Reading Tests, the Heninon Latin Testa, and the 
Douglass Algebra Tests. 

The cumulative method employed in scoring these tests involves 
the weighting of the unit problems, exercises or questions. The 
weights are usually expressed in units of P.E. (M.D.) orS.D. recaloulated 
with reference to an arbitrarily dotenniacd zero point, The methods 
used in the determination of those weights, with slight variations with 
regard to the arbitrary determination of the zero point, are practically 
uniform and established by custom and reason among those devising 
scales and tests and need not bo described hero. It is sufficient to say 
that the procedure is a long and tedious one, requiring much time and 
involving possibility of errors. 

In the scoring of teats where weights are used, a great deal of time 
and effort is involved in adding tho assigned weights. It is much 
simpler merely to count the number of oxoroises, problems or questions 
completed correctly. The percentage of otrors resulting from the 
adding of weights is higher than many are aware, as those who have 
gone over test papers previously scored by another know, Conse¬ 
quently if it could bo shown that little or nothing would bo lost by 
using tho simpler plan of counting tho correct responses, not only 
could the time involved in scoring tho test papers be materially 
reduced and tho possibility of error decreased, but tho intricate 
process of determining tho weights could be eliminated. Charters^ 
reported that in the case of his Diagnostic Tests for Language and 
Grammar, the correlation between tho rankings of pupil's scores when 
the items of tests were weighted and rankings when no weights wore 
used was something over 90 per cent. This is very significant and 
the question raised is of sufficient importance to warrant further 
investigation. 

It must be admitted tliat the ideal scale would measure exactly 
any amount of the ability or quality under measurement with reference 
to an absolute zero. It has not yot been demonstrated that absolute 
zero in any school aohievoment oan bo detorinined. In tho determina¬ 
tion of zero points of scales whero zero points have been located, 
methods have been used which aim at but arbitrary approximations. 
This being the case, measurement by means of our present supply of 

^ Chartera, W. W.; Coiiatruotlng aLanguago rikI Grammar Scale. Journal 
of Edweationoi Research, Vo\. 1, April, 1920, p. 2fi6. 
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tests and scales gives us but little more than relative standing. For 
the purposes of exact measurement of ability as for some purposes of 
psychology, something is lost in the matter of comparison of absolute 
amounts by using only relative standings. The significance of this 
consideration in educational practice is, however, almost an academic 
consideration and might easily be overestimated, For the usual uses 
of educational achievement tests and scales—for the purpose of 
diagnosis, for use in motivation by moans of measurement, and for 
purposes of comparison and survey— relatm standing suffices. 

The Data. —One of the authors of this report some time ago com¬ 
pleted the derivation and standardisation of a series of diagnostic 
tests for the fundamental operations of algebra.’- The scoring of these 
testa involves the use of weights and norms have been determined on 
that basis. These tests were given to a class of 25 Junior V (Grade IX) 
pupils in the University High School of the University ©f Oregon. 
Two seta of scores were calculated, one with the use of weights, the 
other without. For the purpose of checking, coefficients of correla¬ 
tion were found according to two methods; the method of paired 
measures, and the short method of grouping and oaloiilating by group 
intervals.® 


Series A 

Poaraon method of 
paired moosuros ; 

Rugg device for 
grouping 


.98 ±.005 
.99 ±.002 
.006±.001 
.096±.001 

.999±.(K)6 

.906±.00l 

.061±.01 

.901±.002 

Test II, multiplication. 


Tost IV, solution of simple equations... 


Note. —The variations in the cooffioients thus found arc due 
to the grouping of the measures in class intervals and using the mid¬ 
point of each interval to represent in the calculations each measure 
in the inteiwal. 

These correlations were surprisingly high and uniform and it was 
decided to proceed with the investigation using other tests. Test 
papers written by classes in the same school using the Henmon Latin 

’ Douglass, Ilavl U. s A Series of Standardized Diagnostio Teats for the Funda¬ 
mentals of First Year Algebra. Journal of Bducalional Research, October, 1921. 

“Rugg, H. 0,; “Statistical Methods' Applied to Education.” Houghton 
Mifflin Company, 1917, pp. 200-270, 
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Tests and the Gregory Language Tests, were used as material as were 
test papers written by members of a Grade VII class in, the Ashland 
(Oregon) junior high school where the Monroe Standardized Silent 
Reading Testa had been given. Coefficients calculated from these 
were as follows: 



Number 

Pearson inothod of 
paired moaaurcs 

Rugg device for 
grouping 

Honmon Latin Tost A. 

11 

.086 ±.006 

.006 ±.0011 

Gregory Tost for Language... 
Monroe Standardized Silent 

80 

.091 ±.0022 

.976 ±,0061 

Heading Test. 

32 i 

Rato 

Comprohonaion 

.0D3±.0017 
.978 ±.0062 


.0D3±.0017 
.678 ±.0062 










RETENTION AFTER LONG PERIODS 
DEAN A. WORCESTER 
Paycliologioal Laboratory, University of Colorado 

In the summer of 1916 the author learned twenfcy-one 100-word 
selections from the works of Arnold and Huxley. Twelvo of these 
were learned by the auditory method and nine by the visual method 
of presentation. The method of learning was as follows: 

On one ocoasion the subject was handed a slip of printed material 
which was to be read over and over until the subject judged himself 
able to repeat the subject matter exactly. On the alternate days the 
experimenter read a selection aloud until in the j udgment of the subject 
the selection could be repeated exactly. The recitations were oral. 

Only one selection was presented at a sitting. The whole method 
was always used. 

The rato of reading was not pre-determined but the subject was 
allowed to rcceivo the presentation at that speed which seemed to 
him desirable, Unless requested by the subject to do otherwise the 
experimenter read aloud at a rate of two words per second, that is, 
a iOO-word selection was read in 60 seconds. Frequently, the subject 
would request that the rate of reading be increased as the memorizing 
became more nearly complete. This increased speed seemed to bo 
justifiable as it is unquestionably the natural method. 

The subject was asked not to attempt a reproduction of the 
material until practically sure that it could bo given exactly. As 
a matter of fact, it was finally decided to accept and record a recitation 
which did not fall below 96 per cent of perfection. Retention was 
tested by the retained members method. Learning was carried only 
to the point of one successful reproduction and retention was tested 
after 1, 2, and 7 days from the time of learning. Times were kept 
to the even second with a atop watch. 

After approximately 6 yoare, the retention of this material was 
tested by the relearning method with the interesting results shown in 
the following table. 

It should perhaps bo obseiwed that nearly all of the relearning was 
done at niglit and following a day’s work at the University. 
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Average Retention o? Matbruij Leahneo dy Aoditory and Visual ?rd- 
bbntation APTBn Approximately Five Ybabs 


Average lapse of time. 

Average learning time 1016 
Average learning time 1020 
Average saving of time. 


AvbrMjiT 

4 yeare^ 867 days 
13.03 minutos 
7.24 minutea 
44.44 per cent 


ViBDAL 


4 years, 360 days 
6.20 minutes 
8.41 minutes 
48.09 per cent 


Whereas the material had not been reviewed during the interim, 
this seems to be a largo percentage of saving. 







DRILL IN ARITHMETIC 

R B, KNIGHT 
Univoreity of Iowa 


Given the following dvill material for a lower grade: 


(1)0 

(2)4 

(3) 1 

(4)8 

(6)9 

(0)8 

(7)9 

7 

a 

0 

7 

2 

0 

0 

4 

2 

0 

1 

7 

4 

4 

3 

3 

6 

4 

3 

3 

7 

2 

6 

3 

6 

4 

0 

8 

(8)6 

(0)4 

(10)0 

(11)6 

(12) 0 

(13)7 

(14)9 

1 

7 

2 

0 

7 

6 

8 

8 

6 

1 

2 

0 

4 

7 

0 

1 

Q 

1 

3 

0 

4 

8 

2 

0 

6 

6 

3 

6 

(16)8 

(10)8 

(17)8 

(18)9 

(19) 6 

(20) S 

• (21)4 

6 

0 

2 

0 

8 

0 

8 

3 

6 

4 

0 

0 

4' 

1 

0 

7 

Q 

6 

7 

1 

0 

7 

2 

7 

8 

1 

3 

8 


What actual drill do those exorcises provide? How can wc estimate 
the strengthening of spocifio connections through use of the practice 
provided? 

On first glance it would ap|>oar that the dvill is evenly distributed, 
for: 

0 appears 0 times. 5 appears 12 times. 

1 appears 8 times. 6 appears 11 times, 

2 appears 9 times. 7 appears 12 times. 

3 appears 11 times. 8 appears 12 times. 

4 appeal's 12 times. 9 appears 9 times. 

What is the praolicc afforded the specific connections in these exercises? 
Neglecting tlio difforouco between a "seen to seen” and an “unseen to 
seen” number and further neglecting the appearance of numbers 
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(uDBeen) in the tens column,' we get the following spread, when adding 
the columns upward: 


Tabi,b I.—Specjfic PiuoTioB When Addino Upwaiid 

To bo read) tUo number in the column is added to tho number in the tow. Thus 3 
is added to 4 twice, to 7 no times, to 2 onco 



4 

3 

2 


•the 2 ia added to the 3, unseen to scon is when the rosultiug 6, held in mind is 
added to 4; negleeting numbers in tho tens column would bo when, uow adding 
down, the unacen 13 (6 + 7) ia added to 4, consider now the 8 + 4 only. 
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Hero tlio pvacticG allows for 84 sxorciseB of specific connections. 
Fifty different connections receive exercise, CO per cent of the combina¬ 
tions tliat could bo exercised are exercised. Twenty five of tbe 84 
chances are limited to 8 combinations, i.e., 30 per cent of the exercise 
goes to 10 per cent of the possible connections to be exercised. 


Table II. —Sjiowino 'the DisTmimTioN op Spncirio Pbactiob When Columns 
A nn AnnEi) Downwabd 
To bo rood na Table I 



0 

1 1 

2 

3 

4 

i ly 

G 

7 

8 

9 



1 1 

3 

•• 

1 

1 

1 

1 

2 


1 


2 


1 

1 

- 

1 

1 

1 

1 

2 

1 

1 ! 


3 

1 


1 1 




» 

3 

1 


3 

I 

•• 


3 



4 

1 

. • 

1 

1 

1 

•• 

2 

B 

B 


C 

.. 

2 

1 

1 


•• 

2 

B 

B 

S 

1 

Hifli 

1 

1 

■1 

m 

1 

2 

m 

2 

1 
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Here 61 of the possible 84 combinations receive exorcise. Seventy- 
two per cent of the combinations which could be exercised are exercised. 

Now if the pupils add both upward and downward for checking 
purposes the exorcise on specific combinations is as reported in Table 
III. 

Tabijj III.—Amount op ExBnciBB on Spbwpio Connections ‘When Coluains 
Ann Abdsd Both Ways 



0 

1 

2 

3 

4 

5 6 7 

8 

0 

S 

0 

■ 

■ 

3 

m 

B 

1 

2 

3 

2 

•• 

13 

1 


3 

H 

B; 

E 

•• 

1 

1 


1 

8 

2 


3 


B‘ 

2 

1 

1 ^ 

3 



14 

3 


3 

B 

1 3 

3 

1 

2 

3 

2 


26 

4 

2 


2 1 

2 

2 


2 

1 1 

4 

1 

16 

6 

m 

1 

3 

B 

2 

3 1 

2 

2 

1 

1 

3 

17 

6 

1 

2 

B 

3 i 

4 

2 

2 

2 

1 

2 

20 

7 

IB 

1 

1 


B 

6 

1 ' 

2 

3 

l' 

10 

8 

3 

■ 

3 

B 

B 

3 

3 

3 

• • 

3 

. 20 

9 

5 

E 

■ 

B 

2 

2 

1 

2 

3 

3 

20 

S 

18 

18 

12 

17 

IB 

17 

17 

21 

16 

14 



Here there were 168 chances forspeeifio connections. One hundred 
connections exhaust the table, hut even then only 84 per cent of the 
possible connections receive exerdsc. In considering the frequency 
with which numbers are added to, t.e., some number is added to 0,1, 
2, 3, 4, etc., the sum of the columns of Table III show fairly oven 
distribution. When we consider the frequency with which 0, 1, 2 , 3, 
etc., are added to some number, the sums of the rows of Table III, 
we get uneven distribution of practice, the extremes being 1 added to 
some number 8 times, 3 added to some number 2B times. 

Roughly correlating the frequency with which numbors appear in 
the columns to be added, and the connections which arc actually prac¬ 
ticed, we get a correlation of -1-0.36 between frequency of appearance 
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of numbers in the columns and frequency with which the numbers are 
added to some other number in the actual practice when columns 
are added upward. When columns are added down the correlation is 
+ 0.46. The correlation between tho frequency with which numbers 
appear os the first number of tho addition s plus y, and the frequency 
of appearance as the second number, is ~ 0,53. 

Tho columns given at the beginning of this article were constructed 
off-hand, the only thing in mind was to have tho several numbers 
appear on tlie drill sheet approximately equally often, Tho fact that 
the frequency of the specific drill really provided in adding bears but 
little more than a chance relation to frequency of the several numbers 
as printed, may be due to unconsciouB preferences for certain number 
combinations by tho writer. 

An actual drill exercise from the Grade V section of one of the best 
textbooks on arithmetic was similarly treated. The frequency with 
which numbers appear in tho 24 columns is 


Ndmobr 

0 

1 

2 

3 

4 

5 
0 

7 

8 
0 


Fubqdenoy 

0 

0 

6 

16 

16 

23 

10 

14 

16 

12 


The frequency with which some number is added to 0,1, 2, 3, 4, 
etc., is: 



Adding Up 

Adding Down 

NuusEn 

FnE^OENCT 

Ndmbeii 

FitBQDBNCr 

0 

0 

0 

0 

1 

0 

1 

0 

2 

6 

2 

4 

3 

14 

3 

13 

4 

10 

4 

10 

6 

10 

5 

10 

0 

14 

0 

16 

7 

10 

7 

12 

8 

10 

8 

12 

0 

7 

0 

11 
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The frequency with which 0, 1, 2, 3, 4, etc., is added to some 


number ia: 

Adding Up 

Adding Down 

NuuBsn 

FjiBQQUHOr 

Nuuded 

FjlEQfrBNOT 

0 

8 

0 

8 

1 

<i 

1 

6 

2 

11 

2 

12 

3 

8 

3 

6 

4 

8 

4 

10 

6 

10 

5 

10 

6 

11 

0 

0 ’ 

■7 

10 

7 

8 

3 

11 

3 

14 

9 

12 

9 

13 


In these exercises there are 96 chances to exercise specifio connec¬ 
tions. There are, of course, 100 combinations. 59 combinations 
are exercised or 61 per cent of the possible distribution of practice 
ocourring in the adding up. In adding down 61 combinations are 
exercised or 52 per cent of tho possible distribution of practice. 

In this drill material how good an index of distribution is the 
frequency with which the several numbers are printed? 

The correlations between (requenoy of appearance on tho page and 
frequency with which the several numbers are added to some number 
are; 

Frequency in print and frequency in adding up +0.19. 

Frequency in print and frequency in adding down +0.00. 

Frequency in print and frequency in adding both up and down 
+ .26. 

The contention here is not that every drill exercise should spread 
its exercise of spocifio connections impartially. In some instances 
it should. Tho contention bore has to do with the construction and 
metiiods of estimating the true nature of a given drill piece of material. 

The above analysis leads to tho following considerations: 

1. In constructing drill material the frequency with which different 
numbers appear in the columns for drill in addition is a poor index of 
the distribution of practice actually provided, 

2. Drill material for addition of typo » plus y should contain 
proper distribution of tho x and of tho y, Properly distributing the 
® does not distribute the y. 

3. Construction of drill material, such as tho columns used in this 
article, is a trial and error affair, Tho frequency and order of the 
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numbers should be worked over until upon analysis we get, not num¬ 
bers appearing in print with desired frequency, but considering the 
unseen numbers also, numbers appearing in both x plus y positions as 
the following suggests: 
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Hero i represents the number of times a seen number is added to 
seen, and also unseen to seen, and whore I represents the number of 
times the specitio connection should bo exercised because of its relative 
difficulty and amount of previous practice upon it. Dr. Thorndike 
has pointed out that now the size of t in most arithmetic material, 
just happens. 

In making the size of t correct, analysis of the drill material after 
construction and then trial and error rearrangement is a step toward 
the best observation of the law of exercise in arithmetic. ' 


































A COMPARISON OP THE ESSAY AND THE OBJEC¬ 
TIVE TYPE OF EXAMINATIONS 

DONALD A. LAIRD 
Univotsity of Wyoming 

Much interest haa been manifeat of late months in the true-false, 
yes and no, blank filling types of examination as a means for class 
grading. The chief argument used is that they are time and labor 
saving, objective, correlate highly with the usual essay type of examina¬ 
tion and with intelligcnoo scores. 

Whether or not they are raeofluring the same things is quite another 
question but it is one worth considering. This communication 
will report a teat which was mado to compare these two examination 
methods to find out just how closely they were measuring the same 
thing or things. 

A group of 64 students in elementary psychology were unexpectedly 
told to write an essay on “The Adrenals in Psychology.” They were 
given unlimited time, told to write everything they knew about them. 
A week previously they bad completed the study of these glands in 
their course. After the essays wore completed the students were given 
a list of 28 questions covering tho some ground compkiely as it has been 
presented to them. Tlicso questions were of tho objective type so that 
they could bo answered by a single word or phrase. Still, they were 
not of tho “yes-no” typo so that some of the answers could be guosBcs. 

We thus have two examinations on the same subject for comparison. 
The essay examination was chocked over point by point to see how 
many of tlic 28 topids had been included in the essay on the adrenals. 
Then the two methods wore compared by contrasting the number of 
points scored in the essay with the number of points scored on the same 
scale but in a radically different type of examination. 

Tho results arc startling. By grading the essays on tho basis of 
distribution of merit a high correlation might have been obtained 
between these two examinations. But when they are compared in 
the amount of information contained the correlation becomes low and 
the difference a very great one. 

The correlation (r) between tho two forms of examination is -1-0.038 
with a PE of ±0.087. In tho essay examination tho students knew 
approximately half as much as on tho other, when chocked off against 
the points tliat they had been given in the course of the subject 
under examination. A correlation on tho liasis of grades given might 
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have been high. But should the sparao sampling of the essay test be 
considered a true measure of the student’s knowledge when other 
testa show that lie knows really twice as much about the subject as 
he has written? 

In some quarters the essay type of examination is in favor since 
it is assumed to test and develop the precious mental ability of organi¬ 
zation of materials. If auoh ia true one would expect the move intelli¬ 
gent students to do better on the essay type than did the poor students. 
With them the correlation between the points scovocl on the two 
examinations should bo higher than in the case of tho students equipped 
with a lesser amount or an inferior kind of intelligence. 

To test this hypothesis the students were divided into two groups, 
those above and those below the average intelligence of the class. 
With the group of 27 students who were above the class average in 
score on the Thorndike Intelligence Test for High School Graduates 
the correlation (r) between the two forms of examination was +0.107 
with a PE of to ±0.121. What little a correlation of this magnitude 
may mean might be construed to lend validity to the assumption that in. 
the case of the more intelligent students, tho essay typo of examination 
is fairer in showing how much they really know about the subject than 
is the case with their lower scoring brethren. 

SUMMaUY 

A comparison was made of tho showings of a group of students on 
two types of examination covering tho same narrow subject. This 
comparison was made, not on the basis of literary merit or the distri¬ 
bution of the marks, but rather on the basis of tho material presented 
on the toplo under examination that the examinations gave evidence 
of the student having mastered. 

When thus compared it was found that: 

1. The average student knows twice as much of the subject when 
tested by an objective, information test as when tested by an essay 
type of examination. 

2. The correlation between tho points that are apparently known 
by the students on these two types of test is practically zero. 

3. In the case of the students above the average intelligence a 
higher, but small, correlation is obtained which may indicate that the 
examination test becomes more of an intelligouce tost than an evalua¬ 
tion of the materials gleaned from the content of tho course. 



NOTES ON ARTICLES IN EDUCATIONAL 
PSYCHOLOGY IN CURRENT ISSUES OF 
OTHER MAGAZINES 


IlErOIlTKD BY CECILE COLLOTON 


Departmont of Educational Payobology, The Lincoln School of ToaohBre College 

Imtelliqbmob Tests 

Ualioa and Acquired Menial AhilUy ac Jl/ea««red hy Iho Toman Group Test of 
Mental Ability. Dudley W. Willard. School and Society, 1922, Deo. 30, 760- 
766. Itcsulls of retesting 210 liigh school pupils after an interval of 7^ months 
with the Tonnan Group Test, Poaraon r— csO.876. Half of growth measured 
is due to school training; half, to devolopmoni of native capacity. 

Clasei/lcalion of Kindergarten Children for Firet Grade by Means of the Binet 
Scale. Charles B. Dawson. Journal of Educational Ecsoaroh, 1922, December, 
412-422. Tho advantages of classifying first grado children on the basis of mental 
age. Data on 2020 Detroit kindergarton cblldron. 

Pro/cs8or Terman's Determinism! a Rejoinder. William C. Bagley. Journal 
of Educational Ucscarchi 1022, DcoomboTi STl-SSS. Another statemont of Dr. 
Bagley’s ideas on gonoral intelligonco and mental testing. 

O^eclivo Measures of InlcUiqoncs in Rdatim to High School and Coltegs Admin^ 
iitration. Alexander C. Roborts. Eduontional Administration and Supervision, 
1022, Dooombor, 530-610. Brief summary of tho dovolopmont of intoUigonce 
teats. Tlioir value nud llmitatioos in helping to solve administrative problems. 

The MonUil Age of Adults. An Editorial. Frank N. Freeman. Journal of 
Educational Ucsoiirch, 1022, Deoember, 441-441. A review of tho arguments 
in tho Torman^Lippmunn controvomy and adofonse of tlio Stanford-Dinot Eovision. 

A Comparison of the Latin and Nott-Lalin Group in High School. Edith I. 
Newcomb, Tonohors Collcgo Record, 1022, November, 412-423. A summary 
of the results obtained in the initial tests given, September, 1021, in over 100 high 
schools by tho Classical League of Amorioa. 

The Intelligence of a Highly Sdeeled Group. John E. Anderson. School and 
Society, 1022, Deo. 23, 723-726. Scores on Army Alpha rank—tho Hotchkiss 
student group consistontly higher than corresponding dosses in high soboola and 
colleges. Discussion of tho factors in tho selection of tho group. 

The Correlation between Intelligence and Accomplishment Quotients. I. N. 
Madsen. School and Society, 1922, Doe. 16, 000-097. Explains why the cor¬ 
relation between IQ and AQ does not express a tmo relationship. 

Edocationaii Testb 

English Composition Scales in Use. Thomas Briggs. Toachers College 
Record, 1022, Novombor, 423-462. Disouss*^ composition scales in general. 
Reports an investigalion on standards for promotion in written composition and 
ogpcomont botwoou tenohor’a slamlards. A number of compositions ranging in 
soalo values from 4.0 to 8.0 aro npponded. 
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ComparaliDfi VaW}/ 0 / ilifl Bcda ami I/id in Algebra, 

Eleanora Harris and Fiedoriak S. Breed. Journal of Hklucatioual Eesearohi 
1022, Decembor, 303-411. A critical study and compariflon of the Hotz Scales 
and Ihfi Bngg'Clatk Teats in Algebra from tho jwint of view of validity, economy 
of time, ditnculties In giving, taking, and Booring, and diagiiostio value. Nine 
tables and seven Ogurcs present the data. 

Pfognosii T«ia 0 / Ability io Lmn Pore^n Lo»vjiwpe8. Thomas H. Briggs. 
Journal of Educational Rcsoaroh, 1922, Booembor, S86-392. Bcaoribea a battery 
of teste designed to teat speolal ability £0 learn foreign languages. Detailed data 
of results and auggeationa for improving the tesla nro given. 

A imlylia Ssidng Wo. K. Murdoch. Tonchora College Record, 
1922, November, 463-458. Dcaoribos tho conatniotion of a new sowing soalo and 
its advantages. 

hmii^atioMOoncmingikMvrdoMemngScale. Clara M. Brown, Teach- 
01 B College Record, 1922, November, 46H70. Gives evidence for the validity of 
the Murdoch scale and oatabliahcs norms. 

M1SCBL1.ANBOUB 

A Siudy oj High School failures and Thsir Cams. Harvey A. Smith, Edu¬ 
cational Administration and Supervision, 1922, Dccembor, 667-672, A study 
based on tho offioial sohool records of over 300 high school pupils. Causes of fail¬ 
ure are classified and disouss^id. 

t9f2fl of Tyyo (W Related to ReodabiKti/ in Ihc Pint Four Grades. I Hethcrt 
Blaokhurat. School and Society, 1922, Deo. 10, 697-700. Eighteen-point type 
moat readable in Grades 3 and 4; 24-poiiit typo In Qrndos 1 and 2, 

A Scale for itfeoeurin^ llabils and Pracliees in Mlh and Accident Preventwi, 
E, George Payne. Sohool and Socioty, 1028, Jan, 0, 26-27. Disoussion of the 
consiruotion of the scale and ito value in use, 

Delemining GAronoIogical Ago in Decimal Parto 0 / a Pear. Horbort A, Toops, 
Journal of Educational Kesearoh, 1922, Dooomber, 438-440. A table for deter¬ 
mining the e;taot age of a subject upon a given test date. 



NEF PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 
EDUCATION 


1. Psychology Applied to JHconomics ,'^—The chief object of this 
book (os stated on page 206) "is to make contemporary psychological 
views more convoniontly available to economic students, so that such 
students may work out any consequences which seem to them impor¬ 
tant.^^ Naturally, interest attaches primarily to the psychological 
views or questions which have most significance for economics, namely 
the "wants" or motives. But when the author began "to investigate 
the specifically economic motives, he found so little agreement on the 
fundamentals of social psychology involved that a reexamination of 
these fundamentals appeared to be necessary." This reexamination 
occupies about 200 of the 300 pages of the book. The psychologist 
will no doubt have some difficulty in seeing why the student of econom¬ 
ics can get the contemporary psychological views only by being car¬ 
ried back to Aristotle and led up to the contemporary by way of 
Hobbes, Adam Smith, Jeremy Bentham, the Mills Bain oto. The lack 
of agreement upon fundamentals of social psychology is duo not so 
much to errors in interpreting the psychology of the past, as it is to a 
dearth of definitely established facts. It is just such a body of fact 
that the present day psychologists are endeavoring to accumulate. 
The author of this book presents contemporary psychology in its 
relation to economics so admirably that it seems unfortunate that more 
of the book could not be used for that purpose. 

Stimulus and response are accepted as the basis units of human 
behavior, and the more cumbersome concepts such as “creative 
instinct" are analyzed into simpler though less magic terms. Thelistof 
instincts given by Woodworth in his “Psychology" is adopted, as well 
as the same author's concepts of special aptitudes and acquired drives. 

Freudian psychology is examined at some length and some useful 
material is salvaged from it, especially as to the potency of hidden 
motives. The eoncept of the conditioned reflex, recognized as a 
restatement of the old law of association, becomes a very useful means 
of elaborating the simple motives into new and more complex wants. 

* Dickinson, Z. G., ''Economic Motives." Cambridge University Press, 
1922. 
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Trial and error are scon to play their part in the establishment of sooial 
habits and customs. 

Native differences in intelligence and other capacities are granted 
and the necessity recognized of taking these differences into account 
more than is customary among economists in the treatment of thrift, 
providence and similar problems. 

The latter part of the book offers a brief psychological interpreta¬ 
tion of a number of economic terms such as utility, cost, interest, value, 
work, etc. The analysis of work Is especially interesting. There 
need be no single work motive, but rather there are numerous motives, 
simple instinctive tendoncios, aptitudes and acquired habits, varying 
from one individual to another and differently played upon from one 
set of work conditions to another. The manipulation of these elements 
of the complex work motive for better work and greater satisfaction in 
work, is suggested as a possibility. ' 

This book should bo read by students of economics not only because 
of its warnings against “undue expectations of psychological touch¬ 
stones,*’ but because it presents the facts of modern psychology in a 
familiar atmosphere of economio terms and illustrations, and effectively 
points out their application to economics. This economic setting 
may offer some difficulty to the psyohological reader os a knowledge of 
the technical meaning of economic terms cannot bo presupposed, 

A, T. PoFFBNUEKQBR, 

_ Columbia University. 

2. A Conposiie Photograph of Modem Psychological Tendencies ,'— 
This phrase is -taken from the author's preface. The fitness of this 
characterization may be judged by the abundant quotations from the 
writings of Holt, Crile, Tiobencr, Pillsbury, McDongall, Stiles and 
James. 

The first 27 chapters may be grouped under the following heads: 
The instincts, the senses, the emotions, mental processes, fatigue, and 
rest. In the 9 remaining chapters the author discusses such widely 
divergent topics os dreams, suggestion, hypnotism, the mind of the 
crowd, salesmanship, advertising, wit and humor, industrial psychol¬ 
ogy and mental hygiene. 

An unusual feature of the book is tho abundance of half-tones 
illustrating laboratory tests and psychological apparatus in use. 

I., Z. 

^ Givlor,' Eobert Choiiaulfc: “Psychology, Tlio Soionoo of Human Bohavior.” 
Now York, Harper and Brothers, 1022, pp, 382. 
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the effect of the "ACQUAINTANCE FACTOR*’ 
UPON PERSONAL JUDGMENTS 

P. B. KNIGHT 
State Univoreity of Iowa 

One person's opinion of another is a most useful instrument in the 
work of selecting employees or in making decisions concerning the 
promotion or dismissal of a worker. Until objective tests of perform¬ 
ance, ability, character, temperament have been so developed and 
refined that our present tests are but historic curiosities in comparison, 
personal estimates in one form or another must be used. 

During tlio last few years statistics, psychology, and common sense 
have all aided in improving the technique of arriving at a truer 
estimato of a person through the judgment of bis acquaintances. The 
main linos of improvements have been: 

1. Increasing the value of estimates by increasing the number of 
judges. 

2. Determining the reliability of the estimate by proper statistical 
treatment of the judgments. 

3. Directing the judgments through forcing the judges to make 
more analysed estimates by listing specific traits to be judged. 

4. Clarifying the meaning of adjectives such as '^honest,” “hard 
working,” etc., by many devices, the ^‘man to man” scheme perhaps 
being the most fruitful. 

6. Bringing to light several errors in rating schemes which must be 
reckoned with in a sophisticated interpretation of the ratings. 

Two pitfalls in rating schemes have been previously pointed out. 
The importance of both is obvious. Thorndike in a study of ratings 
of engineers and Knight in a study of ratings of teachers have shown 
that when specific traits are rated the effect of general estimate seriously 
invalidates the specific ratings os apecificratings. The too high correla¬ 
tions between relatively unrelated traits show that a judgment of a 
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Bpecific trait is tlio product of general catimate and the eatimato of the 
specific trait. Knight and l^anzen have reported data which show 
that self-estimatca hide a rather constant tendency to over-rate oneself. 

The subject of this paper is to report the effect of varying lengths 
of acquaintanceship upon personal ratings. How long have you 
known the applicant?” is a question appearing on recommendation 
blanks used in the personnel departments of most concerns. Of 
what significance is the answer to the queationj “How long have you 
known the applicant?^ It is commonly assumed that length of 
acquaintance increases the accuracy and the reliability of the, judg¬ 
ment. If the varying lengths of time of aequaintance are one minute, 
two hours, three weeks, there may be a close direct relationship 
between worth of estimate and length of acquaintance. When varying 
lengths of acquaintance is a matter of years wo find that knowing ike 
applicant too long, apparently decreases the critical value of ihefudgmenU, 

The data used in this study arc the analysed ratings of 1048 public 
sohool toaohera of one school system made by thesupev visors under whom 
the teachers wore working. These data were not specially gathered for 
this report. The ratings are a regular part of the supervisor's work. 
Uses of them avegemilnc in the actual conduct of the scliool system. They 
are the responsible judgments of responsible supervisors andconsequent- 
lypossesaareality that certain types of purely experimental datadonot. 

The rating card used is reproduced here. 


I^nmo of Tenohor (SurnAino Vlnl.) 


I. PHYSICAL EFPIOIENCY 
. Bsolth...... 

r. 

n. 










Voloo. 



II, SOCIAL EFPIOIENCY 

1 














Iir, DYNAMIC EFPIOIENCY 












IV. CO-OPEEATION 

1 











Eosponao to Extra Currlo- 
ular Doinanda.... 




SKILL IN TUAOniNa I. 

OrgnnUntloD of Siil>]oot Mnllor. 

Pruoalattoa of Subloot MnUor.... .,. 

‘TsMtiB Coro of Individual Pupil. 

Populariio Work wltliout Cheivp- 

oning It. 

VJ. CI,A8SnOOM MANAGEMENT 

Skill in DlBelplIno. 

NoAinosa of Hoorn. 

Coro of Pliytlool Proporky. 

Attention to Iloat, Light. Vontjla> 

tIoD.. 

AoourAoy ond Promptnogs in Ifnnd* 
ling Enporto.... ,.. 

OBNEBAL EATING. 

Pri&oipaPo Nnmo. 

DAto.. 

School... 

How long hna the nbovo Tonohor 
tnuglit under your Suporvlelon?, ... 

E—Exocllont Q—Good 

A—Avorngo P—Fair 1’—Poor 


II. 


Note. Giado sub heads In column I and main h«ads in oolnmn II. 
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Effect of **Acquaintance Factor*’ upon Judgments 

As we are concerned with tho general problem of the effect of 
varying lengths of acquaintance we can treat the idiosyncrasies of 
individual supervisors as chance errors. Tho main facts are as follows; 

The Influence op the Acquaintance Factor on Estimates of 

General Ability 

From tho rating card shown above wc see that definite superiority 
is designated E, merit above tho average 0, average teaching ability 
A, less than average F, and decidedly inferior ability P. Table 
I reports the percentage of ratings given to teachers known to 
supervisors for varying lengths of time. Tho column in Table I 
headed “Number of Years,” gives the length of time the supervisor 
has known the teachers. 


Tablr I 


Number 

Number 

P. 

0. 

A, 

E, 

of yoara 

of tcachore 

por cent 

|)or cont 

per cont 

per cont 

H 

93 

8 


25 

10 

1 

230 

22 


18 

4 

2 

292 

35 

48 

13 

S 

3 

140 

4p 

46 

13 

2 

4 

93 

46 

46 

0 

1 

5 

00 

47 

43 

5 

5 

6 

43 

06 

30 

5 


7 

60 

40 

42 

10 

2 

8 

20 

60 

46 

6 


9 

0 

83 

10 



10 

3 

07 

33 



11 

4 

60 

60 



12 

1 

100 




13 

7 

86 

14 



14 

1 

100 




16 

1 

100 




20 

2 

100 




24 

1 

100 




25 

1 

100 





Let us compress these data into five “acquaintance” groups. 
One group is composed of those rated who were known to the judge 
less than one year. In tho subsequent charts this group is called the 
“1-ycar group” (n » 03). 
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The second group will be called the whole group. These are all 
the teachers irrespective of how long tho judge.? wore acquainted with 
each teacher, i.e., the factor of acquaintance neglected. The average 
length of acquaintance is about 1.3 years (n = 1048). 

The third group is composed of those rated who were known to 
the judge 7 years. This is called the 7-year group (n = 50). 

The fourth group is composed of teachers known to the Supervisor 
8 years or more (n ~ 47). 

The fifth group ia the fictitious normal distribution. Hero the 
marks given being 

E = excellent 
O — good 
A = average 
F = fair 
P = poor 

The group will rocoivo approximately 10 per cent, E; 20 per cent, (?; 
40 per cent, A; 20 per cent, F; 10 per cent, P. 

normal binbribution ^ 

Toochera of » yoor -— 

Vhote Oroop - - 

TSachoro of 7 yoooo . 

'flfochord of O yco r* . 



CUABT I. 


Chart I shows tho distributions of the percentages of the marks 
given to tho teachers of tho 4 acquaintance groups together with a 
rough normal distribution for the trait *'General Ellicioncy.” 

Tho progressive skewness of the distributions in direct relation to 
the length of acquaintance is evident. The 1-yoar group most closely 
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approximates a normal distribution. The teachers known longest 
are rated in almost grandiose terms. 

Chart II shows the sigma locations of the ratings for the several 
groups upon a normal curve for the trait, general ability. As sigma 
distances are measures of "awayncss'’ from the average, they can be 
used as measures of the significance of a rating. The symbols E, G, 
A, F, P, refer to the ratings. The symbol n represents normal dis¬ 
tribution; 1, the first year group; W, the whole group; 7, the 7-year 
group; 8, those teachers known by their supervisors 8 years or more. 

From this chart wo see that for a teacher known to a supervisor for 
1 year or less to receive an B rating (M) means that she is “away” 



from the average about to the extent that would be normally expected 
{En)‘, B ratings for teachers known 8 years or more is by no means as 
significant a rating. G rating for both the 8- and 7-year groups are 
actually below the averagel We are not interested in the over-rating 
tendency alone. The sweep of the ratings away from normal expec¬ 
tancy in direct relation to length of acquaintance is certainly illustrated 
by Chart II. 

The sigma locations of the ratings for the several acquaintance 
groups are consistent with or rather support the contention that 
increasing amounts of acquaintance increasingly vitiate the ratings. 
GW is the one “error.” It is misplaced by 0.02 sigma. 

The Effect of the Acquaintance Factor on Ratings for Physical 

Eppicibncy 

Table II reports the facts for the Physical Efficiency Ratings. 
As in Table I “Number of Years” is to be read, number of years 
teachers have been known by the supervisor who rated. 
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Table II 


Number 

Number 

Ti, 

G, 

A. 


of years 

of teachers 

per cent 

per cent 

per cent 


H 

88 

23 

63 

14 


1 

232 

24 

66 

14 


2 

208 

34 

40 

16 

2 

3 

134 

31 

61 

14 

2 

4 

80 

42 

48 

0 

1 

5 

50 

32 

60 

17 

2 

Q 

32 

36 

44 

19 


7 

60 

54 

30 

G 

4 

8 

18 

44 

83 

17 


9-10 

10 

40 

00 



11-12 

6 

40 

00 



13-U 

7 

86 

14 



15-10 

1 


100 



1M8 




J 


10-20 




■ 


Over 20 

2 



i 



tiormal bibtribution ■ — 

leodwrA of Vfe year— - 
Whole Qroijp- — 

^chsraof 7 ve<Jf^- 

T^hers of fi25 year*-— , 



CllAHT III. 

Charb III gives the distributions of the percentages of the several 
ratings for the trait Physical Efficiency awarded teachers of the 
different “acquaintance'^ groups together with a rough normal dis¬ 
tribution. The influence of the “Acquaintance ITaotor” is apparent. 
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Chart IV picturca tlio sigma locations of the ratings for the several 
acquaintance groups. Tho decreasing significance of ratings ns 
teachers are known for increasing periods of time by those who rate 
them is obvious. 

The symbols am mad in Chart 11. 



Ckaiit IV. 

The Influence op the “Acquaintance Factor” on Ratings for 
THE Trait “Social Efficiency” 

Table III reports the facts concerning the ratings given for Social 
Efficiency. 

TAni.i3 Ill 


Number Number E, G, A, 

of years of tonchors por cent per cont per cont 


H 

1 

2 

3 

4 
6 
6 

7 

8 

9-10 
11-12 
ia-14 
16-10 
17-18 
19-20 
Over 20 
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Chart V shows the distributions of the percentages of different 
ratings given to teachers known for varying lengths of time by the 
supervisor making the ratings for the trait "Social Efficiency." 


riormol Ciialribution- 

■ffeachorA of 56 year- 

Vliole Qrttip*-— 

Teuelwrs of 7 year* — — • 
T6achera of fli35 ycaia- 



Chakt V. 



Chart VI pictures the sigma locations for the trait, "Social^EfTi- 
oiency,” of the several acquaintance groups. With ratings for social 
efficiency as with ratings for other traits the influence of lengthening 
acquaintance seems clear. 
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The iNinjuBNCD of the Acquaintance Factor upon Ratings for 
THK Trait "Dynamic Efficiency” 

Table IV reports the facts concerning the ratings for dynamic 
effioienoy. This table is read as Table I. 


Taui.b IV 


Number 
of yonrs 

Number 
of tcnchors 

E, 

per cent 

0, 

per cent 

A, 

per Cent 

F, 

per cent 

P, 

per cent 


86 

20 

40 

27 

0 

4 

1 

230 

31 

46 

16 

7 

1 

2 

300 

43 

37 

16 

6 

1 

3 

140 

48 

37 

16 

1 


4 

80 

63 

30 

15 

1 

I 

6 

01 

40 

38 

11 

2 


6 

43 

66 

30 

5 



7 

60 

60 

36 

10 

2 

2 

8 

20 

60 

30 

16 

6 


»-10 

0 

78 

22 




11-12 

5 

40 

40 

20 



13-U 

3 

63 

38 




1&<16 

1 


100 




17-18 







ld.20 

2 

100 





Over 20 

1 

100 






normal Dotributioii- 

Toaoher« ct Va yoar—-- — - 

V/hole Group- 

Iwchora of y yedro-- --— — 

Teachera of G25 years- 


t o A r- P 



CnAiiT VIJ. 
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Chart YII, read as Charts I, III, and V, shows the distributions 
of the percentages of different ratings given to teachers known to the 
sapervisoi's for different lengths of time upon the trait “Dynamic 
Efficiency." 

Chart VIII, read as Charts II, IV, and VI, pictures the sigma loca¬ 
tions of the ratings given tho different acquaintance groups for the 
trait “Dynamic Efficiency." It reveals further supporting evidence 
for tho contention that increasing amounts of acquaintance is associ¬ 
ated with decreasing amounts of critical judgment. 



SUMMABV 

The above data may be summarized ns follows: The Acquaintance 
Eactors had 80 chances to operate. Sevonty-threo times tho signifi¬ 
cance of the ratings were obscured; ».e., in comparing an acquaintance 
group with a group known by the judge for a shorter time, the sigma 
location of the ratings for the longer period was misplaced by a greater 
amount than the misplacement of the group known for a shorter time. 
There were 7 exceptions. In these instances the significance of the 
rating of a longer known group was misplaced by a less amount. The 
average misplacement of these exceptions was 0.17. 

All groups are over-rated but the amount of over-rating seems to be 
a function of the “Acquaintance Factor." There is little reason for 
uneasiness about the fairness of this concluBion. Tho other apparent 
explanation is that as teachers g»ii experience they do gain in power 
and so the ratings as reported represent the facts. This is hard to 
believe for two reasons. First, several published studies upon tho 
effect of length of experience upon skill in teaching, report but little 
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relationship between length of experience in amounts implied here and 
skill in teaching. Secondly, the explanation of the data as true because 
of length of experience would have to assume that length of acquaintance 
was a reliable measure of length of experience. It is not. Most of the 
teachers in this city had 2 years of experience in teaching before they 
taught at all in this city. Further, while no relatively young teacher 
could be Imown for many years by any supervisor, many teachers of 
long experience are in the 1-ycar group; others arc not in the 7- and 8- 
year groups for frequent transfers of both teachers and supervisors in 
this city often causes different combinations of teaching units. 

If one were tempted to explain the reasons for this over-rating the 
following three considerations may be found to be a part of the total 
explanation. 

1. A supervisor could hardly admit, consciously or unconsciously, 
that “ being under his direction brought no improvement. ” In a way 
the longer known teacher being rated higher is a compliment to the 
supervisor's influence. All of us are familiar with the mixed truth of 
such statements by a supervisor as this: “Teacher A was pretty 
poor when I first got her but I have developed her into a first class 
teacher,” 

2. A more subtle tendency operating to a slight extent in the Factor 
of Acquaintance may be unconscious indentification. Older teachers 
are more like the supervisors themselves, at least in age. 

In placing teachers under a supervisor attempts are made to make 
assignments pleasing to both supervisor and teacher. There is quite 
possibly a tendency for certain principals to have one kind of a teacher 
instead of another as far as rough personality traits are concerned. An 
instance of a teacher’s rating being affected by the bias of the super¬ 
visor is reported by Superintendent Reed of Aki'on, Ohio. He asked 
each principal to give the name of his best teacher and why that 
teacher was best. 

Principal A gave as the reason for superiority in his choice, “ He is a 
he-man. ” 

Principal B gave os the reason for superiority in her choice, “She 
holds up high ideals before her pupils.” 

Principal A is an ex-athletic director. Principal B is an inveterate 
Sunday School worker. Identifying superiority in teachers with the 
judge's pet hobby seems to be operating here. 

3. Negative adaption to persons as well as to environment is a 
possible explanation. We get used to people. Quirks of mannerisms 
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anaoying at first soon Iqko their potency. Toaehera who have become 
fixturoB avc accepted while new teachers aro still on trial. Little 
mistakes, differont ways of doing things, not ** knowing the ropes," and 
other evidences of newness may effect supcrvisoi's’ judgments of new 
teachers far more than they effect efficiency of classroom instruction, 
eapecially when moat of the teachers have had 2 years experience 
before teaching at all in this city. The consistency of improvement in 
Physical Efficiency as teachers get older is a good case in point. No 
one would contend that the older a teacher is Che more vitality and 
physical efficiency she has, Learning to get on with the supervisov 
may account for the higher ratings of older teachers os much or more 
than actual increase in classroom officioncy. 

The data support the contention, that after allowing generously for 
increased efficiency due to experience, there remains a Factor of 
Acquaintance which tends to make the judge more lenient in pro¬ 
portion to the length of time he has known tho person judged, 


Tub Faotok of Acquaiktamoe and Analysis of Specific Traits 

Wo will now report data calculated to show that the Factor of 
Acquaintance causes not only greater leniency but also less analysis. 
It might bo thought that the longer a judgo know the persons judged, 
the keener would bo his analysis of specihe traits; i.e., tho well known 
halo of general estimate would bo less and less. As a matter of fact 
critical analysis seems to get less and loss and the influence of general 
estimate greater and greater as the length of the acquaintanoe 
increases. 

The amount of critical analysis is shown by the intercorrelations 
existing between the traits. The size of the intercorrelations in this 
study is reduced below tho usual intercorrelations because of the 
restricted range in the distributions. There aro but five steps E, G, A, 
F, P. Where more steps in the arrays were obtained in similar studies 
far higher intGroorrelations have existed with great uniformity. Wo 
base our argument here not on the size of the intercorrelations but on 
the directions of their increase. All are sraall because of restricted 
range but the intercorrelations got larger in tho groups known to the 
judges for a longer time. 

Presenting the intercorrelations between traits of different acquain¬ 
tance groups in order of length of acquaintance wo liavo: 
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ConnBLATlON DBTWBEN 


Social 


Pbyeioal 


Dynamic 


PiirsiOAT, 

.148 

.443 

,369 

.010 


DTKAMtC 

.376 

.626 

.610 

.611 

.280 

.625 

.230 

.605 


Qkkbaal 

.405 

,610 

.006 

.606 

.403 

.404 

.400 

.647 

.620 

.760 

.876 

.709 


lu each set of form correlations above tho top one is that of the 1- 
year group, the next for the whole group, the third for the 7-year group. 
The bottom correlation is for the 8-year group. 

If the Factor of Acquaintance tended uniformally to increase the 
intercorrclations we could expect the interoorrelatious of the l-year 
group to be smaller than those in all other groups; interoorrelatione of 
the '‘whole” group to be smaller than those in the 7- and S-year 
groups; intorcorrelation of the 7-year group to be smaller than those 
in tho 8-year group. 

With the four traits reported, there are 36 chances for tho direction 
of change of intercorrelation to be studied. Thirty-one times the 
direction of change fits the Factor of Acquaintance theory. Three 
of the exceptions are correlations between Physical Efficiency and the 
other traits in comparing the whole with the 7-year group. The other 
two exceptions arc in comparing the 7- with the 8-year group in the 
correlations between general ability and social and dynamic efficiency. 

The average intercorrolation of the 

1-year group is 4-.442 

Whole group is 4-.561 

7- year group is 4-.508 

8- year or over gi’oup is -f .628 

The average difference in siie between tho intevcorrelations of the 
1-year group and the 8-year group is +0.186 which, in high positive 
correlation coefficients, is a significant difference. The average diffor- 
ence between the sum of the correlations of the 1-year group and the 
whole group (average acquaintance 1.3 years) and the sum of the 7-year 
and 8-year groups is +0.132. 
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Just how much intercorrelation there really is between useful traits 
is unknown beyond the probability of it being positive, and small. 
Independent of this consideration, the data support the statement that 
the influence of the ractor of Acquaintance is to increase the size of 
the intercorrelatioDS which means that less analysis is taking place. 

The’Factor of Acquaintance, then, operates to make ratings more 
lenient, t.e., increases the over-rating, and to make ratings leas critical 
and less analytical, i.o., inoreases the influonco of the halo of general 
estimate. It is in the direction of truth to discount the ratings of 
judges when acquaintance has been long. In a way it is literally true 
to Bay of a judge’s estimate: ‘‘His judgmout is of doubtful validity 
because he has known his man too long," 



A STUDY OT THE INELUENCE OE SECTIONING 
STUDENTS UPON THEIR 
ACHIEVEMENT 

DONALD A. LAIRD 
Univereity of Wyoming 

With the increased student enrollment of universities and colleges 
within the past few years it has commonly become necessary to separ¬ 
ate the students registered for the more popular or required courses 
into two or more sections. Physical limitations in the form of 
restricted seating facilities may make this sectioning imperative; or the 
sectioning may be motivated by the belief that large classes are difficult 
to handle or that the individual student suffers in proportion to the 
size of the class group. 

On the other hand, certain very strong objections are sometimes 
raised to this sectioning of the larger olasses. There are those objec¬ 
tions which have their root in the experimental work that has been done 
on the relative mental efficiency at varying hours of the day. It is 
argued that the students of one section may be given an advantage 
over the other section simply through the hour at which eaoh group 
meets. This advantage, or disadvantage as the case may be, has a 
multitude of causes. 

The fatigue of the instructor is to be considered in case the sections 
follow one another and are under the same instructor. This fatigue 
would give the advantage to the section meeting first. But should not 
even the most experienced teacher improve somewhat through prac¬ 
tice? Perhaps the loss of the second section due to the instructor’s 
fatigue may be offset by his improvement due to practice. 

The argument, pro and con, in these general terms may be carried 
out to any length. The only justifiable method of really determining 
whether or not one section has an advantage over the other due to the 
fact that the one follows the other or precedes it, is to compare the 
records of scholastic achievement of a large number of classes that have 
been sectioned. Then there will bo some basis in fact as it actually 
works out for the discussion, 

I 

The following portions of this oommunioation will report the find¬ 
ings of a study of the grades received by the elementary students in 
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psychology io a large oniveraity throughout a period of 10 years. 
The records studied extend from the academic year of 1909-1910 until 
1921-1922. During this interval the students enrolled in elementary 
psychology were divided into two sections. From 1909 until 1916 the 
sections met three times weekly. Section A met at 0 and Section B 
at 11. The regular class meetinge were devoted either to lecture or 
to experimental work. From 1916 until the close of the period cov¬ 
ered in this report the sections continued to meet at 9 and 11 respec- 
tivoly but only semi-weekly. These 2 hours were devoted to lectures 
and experimental work. From 1915 on, the third hour was spent in 
smaller groups at scattered periods in a conference with some staff 
member from the Department of Psychology. The records for 1918- 
1919 are omitted from this report due to the general demoraUTiatiorv 
and fragmentary records caused by the Student Army Training Corps. 
The 1919-1920 records arc also nob included since the quarter system 
was in use for that year only. All tho 10-yoar records herewitli 
reported are on tho semester basis. 

Tho marking system used in reporting grades is from A to F: 
A is given for tho best work; F is failing; C, tho average. Tho marks 
given for the first 0 years may bo fairly compared since they wore given 
almost entirely by the same instructor. The Inst 4 years of grades 
reported are not quite so homogenous, tho markings for these later 
years being in the hands of those who handled tho smaller or conference 
sections. For these years we must therefore trust to the safety 
afforded by numbers and ossunve that in Iho long run each oonfcroncc 
section instructor had students equally distributed among the 0 and 
11 o'clock sections. 

If there is to bo any difference found between tho achievements of 
the two groups one would bo led on a priori grounds to expect the 
most marked uneqiialness in tho first interval of 6 years. During this 
interval the two large sections met three times weekly. During the 
later interval the large sections mot only twice weekly while the 
smaller sections met at widely scattered periods for tho conferenoo. 
Accordingly, in presenting the findings they will bo separated into the 
two intervals; tlio first of 6 years and the second of 4 years. 

In Table I are given tho percentage distribution of the various 
marks by years, Romester, and section for the first interval. Section 
A, it will bo recalled, mot at 9 o'clock, iSection B at 11 o’cloclc, three 
times weekly. 
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Table I.— Distribution op Grades in Per Gent IlECBrvBD by tub Two Sec¬ 
tions, 3000 TO IMS 


B«elion A 




J1 

C 

D 

E 

P 

Number 
of oaeos 

lOOO-lO 

6.3 

8.3 


27.2 

40.0 

40.0 

11.6 

13.0 

4.6 10.6 

4.6 

0.0 

no 

06 

lOKMl 

6.8 

6.0 

wst 


41.7 

61.0 

21,2 

16,0 

6,8 

2.0 

1.0 

0.0 

103 

100 

1011-12 

6.4 

3.4 

,1111 

26.0 

46.2 

40.0 

14.1 

20.1 

14.1 

4.6 

1.2 

0.0 

78 

80 

1012-13 

6.4 

7.8 

16.1 

27.2 

64.0 

40.0 

16.1 

14.6 

8.0 

1.2 

0.8 

1.2 

03 

86 

1013-14 

3.4 

6.4 

10.4 

WfMm 

45.0 

61.4 

22.4 

21.0 

0.6 

2.7 

2,0 

0.0 

116 

100 

1014-16 

3.7 

6.1 

10,8 

18,7 

48.6 

63.1 

24,2 

21.4 

0.8 

0.0 

0.0 

0.0 

101 

164 

Intomli 















Kverago 

6.2 

0.2 

10.0 

23.8 

46.1 

48.4 

18.0 

17.8 

8.2 

3.4 

1.8 

0.4 




SooUon B 




D 

C 

n 

E 

p 

Number 

of ODSCB 

1009-10 

1.9 

3.8 

27.1 

20.1 

00.1 

60.6 

8.7 

15.5 

J.l 

0.0 



103 

103 

1010-11 

3.0 

6,7 

30.0 

20.0 

43.4 

47.7 

13.1 

6.7 

2,0 

7.0 

til 

2.4 

88 

70 

1011-12 

3.0 

0.0 

20.3 

28.0 

40.1 

43.0 

21.8 

22.0 

8.1 

4.4 

1.2 

0.0 

87 

91 

1913-13 

2.8 

2.7 

21.0 

20.7 

61.3 

64.0 

20.1 

13.0 

3,7 

0.0 

lina 

0.0 

100 

108 

1913-14 

0.0 

3.8 

18,6 

10.1 

63.3 

46.0 

18.6 

27.1 

8.1 

10,1 

1.6 

2.6 

186 

120 

1014-16 

4,1 


18.8 

23.0 

60.8 

60.0 

10.0 

20.1 

0.0 

1.1 

IB 

m 

144 

188 

Intervnl, 









H 

M 

IH 




avorngo. 

2.7 

3.4 

24.0 

26.2 

40.8 

48.S 

16.6 

17.1 

m 

B 


0,0 




< Tbo figures to tho right in oAoh dlvisloo ftro for the sosond eomoster. 


Chart I compares the accomplishments of the two sections by 
showing the difference in the per cent of the marks given to each group. 
The plots above the base line repi’esent an excess over the 11 o'clock 
section of the grade indicated to the amount indicated by the 9 o’clock 
section. The plots below the base line represent an excess of the 
grades to the amounts indicated but for the 11 o’clock section. A 
study of the figure to the left, in which, the semester differences are 
indicated, shows that there is a small, but perhaps significant, differ¬ 
ence between the two sections. It is apparent that the earlier sections 
have a larger proportion of better and poorer students. Perhaps it 
should be stated best atudenta, since the A section exceeds the B 
section in students making the highest mark but not in the students 





































140 The Journal of Kdutialvjmi Pf^ycholagy 

making tlio next Iiiglioat mark, wlnrh isfitill f^ortfiidcrecl to bo diatinctlv 
above average in llio iiialitnlitm from wltifli ihf* duta were socuted ’ 
In thcGret R(;mc«t(*r’H work lln'a Uifrpmncv* brtwoeii Iho twoBoctions 
is more marked than in Iho ^4*1^1111! wini'^tor. Tim aecond semester 
has a leavening effect, afl it w-pre, which (rndu to Muonth out the soolion 
differences, especially in the lower marks. ^\'lipn wc average the 
grade distributions for both scnirjtici^ by pppLioiiK a difference such as 
that graplicd to the right of Chari 1 is ftmnrb The rrcUou which met 
at 9 o'clock is Btill found to make more of the highcfft and lower grades 
than the other siection which nnsl 2 hours lau-r. 

CHAPT X 
dematter 



Section A apparently either oetpcta or mnkcHmippriora and inferiors 
while Section B oncoiiragoa averageneoa. 11 in difTicuU, if not impos¬ 
sible, to account for thifl difference aa being due to the Bplcction ofthe 
students for the two Bccliotw. The aclirdule of cloi^Hca in the college 
is arranged around the schedule for elrmenlary pHycImlogy in order 
to prevent conflicts in the cIors progrniiiM of the uiulprcltiwinen. The 
students are placed in one section or Ihn other depending largely upon 
the order in which they present thcmaelveH lK*fore (lie. neclioningdesks 
on registration days. The peculiar difTereiuM* lietwccn the marks 
received by the two sections can also hardly he aecounlcd for by 
aseumingthat tlicre iaan unequal sox disc rihiUimi helvveon the Bcctious. 
Kecords for one year, which are lypira), show fit) men nml 82 women in 
Section A and 63 men and 81 womm in Seelion B. 

Just how to account for tliia iiinmual difTereiiPi! in the accuinplish- 
ments of the two acclions is diflicuK. ](. in nppnrenl. that llicro is 
some difference, and Huh dinfcrenci* is of u horl to ho nignificnnt, 
although it is not enormous. Itiagenernlly ngronl nmongliiCKliidentfi 
that they are not at their Ixwt at the U oVlock hour; thin is based 
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upon the verbal reports of the students and not upon any experimental 
work. The students further state that they have to exert themselves 
unusually to get as much out of the 11 o’clock lecture as they would 


CHART II 



have obtained spontaneously 2 hours earlier. Perhaps this added 
effort accounts for the minimum of failures and low grades in Section 
B. But if this holds equally for all students there should be a general 
skew of tlie curve for Section B toward the higher end. As a matter 
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o£ fact tbeve is a skewing in the grade of B but a nvavkccl decrease 
in the highest mark of A, 

Some change in the accomplishments of the two sections in the 
second Bcmoster lessened the difference between the two groups 
of students. This should merit our attention. In Chart II are 
plotted tiie distributions of the grades made by each section, by 
semesters. This is charted in per cents to make for ease of visual 
comparison. Section A. it will be noted, shows a marked improvement 
in the second semester; there arc fewer grades below average and more 
at average and above in the second semester. Section B, on the other 
hand, shows no such consistent improvement. There is an increase 
in A *3 approximately equal, in terms of per cents, to the same gain by 
the other Section. There is an increase of B’s of about 1 per cent, a 
decrease in the average mark, a slight decrease in the D’s, fewer 
while the failures remain the same os for the pi'OcecUng semester. 


Taulb II.—DisTmnuTioN op Gkadbs in Pun Cent Ubceived uy the Two 
SscjaaoNB, 191/5 to 1021 
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It is this marked improvement in the work done by the 9 o'clock 
section during the second semester that brings about a lessening of the 
difference between the accomplishments of the two sections. This 
improvement on the part of the students who did poor work in the first 
semester tends to bring the accomplishment of Section A nearer to 
averageness. Tho 11 o’clock section exhibits no such change in the 
work of the second semester. 

Summarizing tho data for the first 0 years it may be said that tho 
achievement of over 1300 students in sections of elomentavy psychology 
under actual working conditions indicate that those enrolled in a 9 
o’clock section will be more likely to do cither better or poorer work 
than those enrolled in a section meeting 2 hours later, who will tend 
to averageness, H. is also evident that greater improvement is 
evidenced by those enrolled for the earlier hour as the course progresses. 

CHART 111 
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II 

It will be recalled tliat the data for tho last 4 years were kept 
separate from tho other duo to the fact that a complicating factor in 
the form of smaller sections known as Confereiioe Sections entered 
upon the scene. Table II gives the data for these years in a form 
similar to Table I. 

Chart III compares the two sections as was done in Chart I. 
But notice the difference; there is appai'ently little rh^nne or reason 
in their difEorcncos in accomplishment now. Finst of all, it is evident 
that the diffei'enccs arc of a smaller proportion; and secondly, they 
do not persist in a given dircotion as for the 6 previous years. 

Comparing tho calculated average grades for tho year for this 
4-year interval witli those of the 0-year interval (plot to the right in 
Charts I and III) tho general tendency of both intervals is seen to be 
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the same. For the later intorval, however, the variations between 
the sections are less marked, but still Section A is leading, though 
barely so, in superiors. 

Comparing the marks i-eccivcd in the first semeater with those of 
■ CHART IV 



the second semester by sections (sec Chart IV) it is apparent that 
Section A again shows a decided improvement in the second semester. 
Section B also begins to show signs of improving in the marks received 
in the second semester, It will be recalled that for the first interval 
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of 6 years Section B showed but little, if any definite, improvement in 
the second semester over the first. 

Summarizing the data for the second interval of 4 years it may be 
said that with the elimination of one of the hours for the meeting of 
the general sections and the substitution therefore of a smaller Con¬ 
ference Section there is a lessening of the differoncea between the 
marks received by the two general sections, altlmugh there still persists 
a small factor of advantage favoring the 9 o’clock section. 

Ill 

It remains for some explanation to be attempted to account for 
the variations in the achievements between the two sections. Revert¬ 
ing to the general discussion, one would not expect a great difference 
between the sections. After all the time spent in the lecture room is 
only about half that spent by the ideal student in following through 
the course. Readings are perused under the same conditions as would 
be followed whether the student were in one section or another. Before 
the advent of the Conference Sections there was no opportunity for 
recitation worthy of the name during the lecture hour. Since moat of 
the material presented in the lectures is not foreign to the required 
readings one would not expect the more matter of what hour the lec¬ 
ture is attended to make a great difference in determining the accom¬ 
plishments of the students in a given course. But thefaciual material 
which has been examined seems to indicate that the lecture hour does 
have some effect. Of course, the difference between the accomplish¬ 
ment of the sections is not great enough to make or break either, but 
still it is great enough to be significant—theoretically, at least, if not 
practically. 

Gates found* that about 4 times as many students preferred the 
9 o’clock hour as favored the 11 o’clock hour. This is the general 
conviction of college students, in the writer’s belief, But still Gates, 
and others, have found that for such activities as auditory memory, 
rapidity of learning as measured by a substitution test, and recogni¬ 
tion, there is a factor of advantage—regardless of what that factor may 
be—that favors the 11 o'clock worker rather than the 9 o’clock worker. 
The average of all Gates’ Tests slightly favors the 11 o’clock hour. 


^ Gates, Arthur I.i Diurnal Variations in Memory and Association, Univer- 
sity of California Publications in Psychology, Vol. I, No, 6, pp. 323-344. 



162 


The Journal of Educational Psycholooy 


Bub sfcill, undei’ actual classroom conditions for a period of years, 
students at 9 arc making better marks than those at 11. Of course, 
instructors’ marks are not the acme of accuracy but when tendencies 
so definite as those found are present they aro not to be scoffed away. 

Neitlioi* is one’s feelings nor notion of mental efficionoy always the 
most precise index of his real working condition. But it is these 
feelings of fatigue or readinoss that determino largely the responses 
of the students to certain situations. Tlic verbal reports of the 
students in the sections for 1921-22 aro overwhelmingly to the effect 
that they have to force an alertness and attitude of attention at the 
11 o’clock hour. The learning altitude of Section B is much more 
favorable to learning that that of Group A, The correlation (r) 
between marks and the composite score on the Thorndike Intelligence 
Test is 0.21 for the 9 o'clock section and 0.32 for the 11 o’clook section, 
for the first aemestev of the academic year 1921-22. The Thorndike 
Test was taken at the same hour under the same conditions by both 
sections. The correlation is higher for the 11 o'clock section. Per¬ 
haps because students go about Intelligent) Tests with a forced attitude 
similar to that which the students of Section B say they have to assume 
before the noon hour. 

The deficiency of Section B in high ajul low marks may be 
aocounled for in part by comparing this with Gates' findings. Gates' 
work with college students showed that their variability was greater 
at 9 o’clock than at 11 o’clock, except for logical memory. This may 
explain why the 9 o'clock section had a wider distribution of marks 
than the 11 o’clock section. 

IV 

A study of the scholastic accomplishments of 2,700 students in 
a, course in elementary psychology bUowb that dividing these students 
into two sections which mot at 0 and 11 o'clock respectively brings 
about the following variations: 

1. The 9 o'clock section makcB either better and poorer grade.s than 
the 11 o'clock section; while the 11 o’clock section tends toward 
averageness in accomplishment; 

2. The differences between the accorapliahments of the two sec¬ 
tions is greatest when the sections meet at the hours designated 
three times a week than when they meet twice a week, with a confer¬ 
ence period subBtitutecl for a third lecture; 

3. The variations between the sections in marks earned become 
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less as the course is continued, more so for the three weekly lectures 
than for the two; 

4. Students in the 9 o’clock sectiou show greater improvement as 
the course progresses than do the othei* students; 

5. Intelligence, ns tested by a standard test, correlated higher with 
the accomplishments of the 11 o’clock group than for the 9 o’clock 
group. 



EFFICIENCY OF INSTRUCTION IN UNSELECTED SEC^ 
TIONS IN ELEMENTAEY PSYCHOLOGY COM- 
PABED WITH THAT IN SECTIONS SELECTED 
ON BASIS OF INTELLIGENCE TESTS 

HAROLD E. BURTT, LAURA M. CHASSELL ANDELIZABETIIM, HATCH 

Oliio Slnlo University 

Tile practical use of iutolligcuce leate in universities lias hitlierto 
been limited for the most part to the solving of problems concerned 
with students considered as individuals. The results of tests have 
been employed ns a supplement to or substitute for entrance require¬ 
ments, and as a means of dealing with such problems as tliose presented 
by students who are delinquent, on probation, or desirous of carrying 
extra hours. The following study deals with the application of the 
results of intelligence tests in the division of university students into 
groups for purposes of instruction. The main pioblem of the study is 
to determine wliothcr a group of students fairly homogeneous in intelli¬ 
gence can bo taught more efficiently than a heterogeneous group. 

During the year 1920-1921 each of the three writers was scheduled 
for a section in elementary psychology at the same hour. Nearly 
all of their students either took the university intelligence tost (a 
somewhat modified army "Alpha”) early in October or had taken 
Alpha the previous year. The two teste were of slightly different 
difficulty, but were subsequently equated by converting raw scores into 
percentile scores, using as a basis the scores made by over 2000 fresh¬ 
men in each case. On the basis of these iiitclligoncc scores, the stu¬ 
dents in the 3 sections wem redistributed, as soon as the returns from 
the intelligence tests were available, into sictions more nearly homogen¬ 
eous in intelligence. A similar distribution was made at the opening 
of the second semester. The poreoniiel of the same section during 
successive semesters remained fairly constant although there were cer¬ 
tain drops, withdrawals, additions, transfers, and slight changes in 
distribution of the aectiona as Indicated below. Since each semester’s 
experiment was evaluated aa a unit, constancy of personnel was not 
requisite. In addition to these sectiqne Bchcchilcd at the same hour, 
each writer had one or two secllong of students taking the same 
course at a different hour of the day. Thus for each selected section 
there was available one or more control sections taught by the same 
instructor. 
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Table I shows how the oxpovimciital sections were distributed on 
the basis of intelligence. The table gives for each semester the number 
of students in each sootioii, the mean percentile intelligence, the stand¬ 
ard deviation, and the range of intelligence. Tor example, the first 
line indicates that the selected high intelligence group during the first 
semester comprised 47 students whose mean intelligence percentile 
was 83 with a standard deviation of 9 and a range from 65 to 109. 
For the second semester, the corresponding figures wore 49, 84, etc. 

TadijB I.—iNTEnniaBNOH rjinoBNTit,M8 OF Sblbcted and CoNTnoij Gnoura 
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> Oflo opparonlly brishl studont u'iis rolontod and prorod io bo 57 poroontJlo j-athor than 80. 


The next row gives .similar data for the control section taught by the 
same instructor. Sections for first and second semester listed in the 
same row had considerable continuity of peraonnoh It is obvious that 
all the sections involved in the experiment wcio either distinctly 
homogeneous in intelligence or else distinctly hetcrogoncouB, the 
standard deviation of the selected sections being as a rule only slightly 
more than onc-tliird that of the control groups. The difference in 
the average intelligence of the selected sections was also very marked, 
the average percentile ratings during the first semester for the 3 groups 
being 83, 64, and 21, respectively; and during the second, 84, 54, and 
22. One qualification should be jnade regarding the personnel of 
the selected medium intelligence group M, especially of the first 
semester. Since the intelligence records of a number of students 
were not available at the time of the original distiibution, all such 
students were placed in the medium intolligcnco section in an effort 
to keep at least the extreme sections homogeneous. When these 
students subsequently took the intelligence test, it developed that 17 
of them properly belonged in one or the other of the extreme sections. 
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The data in regard to tlicaostiidcntsavcomitted in the tabulated results. 
The second Hemcstcr's distribution largely rectified this condition, 
only 7 students who belonged in the oxtremo sections remaining in 
the medium section. 

The data include only students who completed the work of the 
course and who took the intelligence test. 

For all the sections involved the content of the course was simUar, 
for they used the same textbook, and covered approximately the same 
ground. The methods of instruction likewise were fairly similar, the 
work of the course consisting of textbook assignments, lectures, and 
oral and written quizzes. Moreover, the similarity in content and 
method was naturally much greater in the aectioiis taught by the 
same instructor. Direct comparison between these sections is thus 
of greater significance than comparisons between other sections, and 
is the aspect of the study most emphasized. 

Comparison between the selected sections is also possible, however, 
a common examination having been given to nil at the close of the 
first somester ns a part of the regular final examination in the course. 
It was arranged by tho throe instruotoi^ in consultation, and com* 
prised 35 statements, some true and some false. The students 
marked them accordingly, and wore scored number right minus 
number wrong. 

During tho first semester tho rate of progress maintained in all 
sections was practically normal. During the second semester, how¬ 
ever, the selected high intelligence group and tho selected medium 
intelligenco group and its two controls wore "pushed." Tho high 
intelligence group was given longer assignments than its control and 
covered the course as outlinocl in 27 hours instead of 37, tho remainder 
of tho time being devoted to discussion of various aspects of applied 
psychology, The medium intelligence group covered the ground in 
30 hours instead of 37. The low intelligence section maintained the 
normal rate of progress. 

The theory underlying the principal aspect of the investigation is 
as follows: If the instruction given to a group of students is such that 
each one is stimulated to do his best, the correlation between intelligence 
and the academic marks obtained in a given course will bo high; 
conversely, if intelligenco correlates highly with marks, it is probable 
that the instruction was ofRcieiit in stimulating encli student to use 
this maximum intolleotual ability. If this criterion is valid, it is 
possible to compare efficiency of instruction in a selected homogeneous 
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psychology section with that in an unsclected control section taught 
by the same instructor. This is the principle upon which the results 
of the experiment have been evaluated. 

Table II presents all the results from the standpoint above indi¬ 
cated. Bach cocfiiciont in the table represents the correlation between 


Taui.b II 
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.01 
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intolligonco and 0nal mark for the semester's work in elementary 
psychology for the group and semester indicated. All coefficients 
were computed by the products-moments formula, using intelligence 
percentiles and quantitative statements of psychology grades. The 
coefficients in brackets for the selected groups are the regular Pearson 
coefficients, while tho.se directly to their left are augmented for restric¬ 
tion in range of intolligonco due to selection—the homogeneity of 
these selected groups lowering the correlations secured.^ 

‘ It is possible to determino Iho corrointion that would be expected if the intolli- 
gCQoe had covorod a widor range, knowing the standard deviation of the distribu¬ 
tions of the smaller and t)io wider ranges, by the \i 60 of a formula for which tlie 
writers are indebted to Professor T. L. Kelley. It is as follows: 



where r is tlio oorrolation between the two variables when x has tlio smaller range 
with dispersion o*; and U, the correlation when x has the wider range with diapor- 
siou Sj. 

I3y the application of this forimila the coeflieienta over the narrow range have 
been augmented to thoso which would bo expected iC the range had been the same 
ns that of the corresponding control section or sections, 
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The cnicial comparisonB in the tabic are those between a selected 
section and its corresponding control or controls. With respect to 
the section selected for high intelligence and its control, it is obvious 
that in the first semester there was no greater ofiiciency of instruction 
in the selected group than in the inxsolected group taught by the same 
instructor. In fact, the correlations of 0.32 and 0.46 point somewhat in 
the other direction. The results for the second semester were quite 
difliercnt, for the correlation in the selected group jumped to 0J3, 
while that for the control remained about tlic same os before. The 
explanation is not far to seek. During the first semester the two 
gi’oups were kept at the same pace. The daily assignments were 
practically identical. It would appear that this pace was not suffi¬ 
ciently rapid for the selected group. During the second semester the 
control group continued approximately as during the first, while the 
high section was “pushed.” Apparently this procedure was more 
stimulating. The obvious conclusion is that from the pedagogical 
standpoint nettling is gained by grouping students of superior intelli¬ 
gence for instruction under the conditions of this experiment unless 
they are forced to cover the material of the course at a more rapid pace 
tlian the average. Not only are they obviously able in the latter case 
to cover the ground more rapidly, but the individual student comes 
nearer working at his maximum intellectual ofiiciency. 

A similar tendency is found in the returns for the group selcotecl 
for medium intelligence and its controls. The results for the first 
semester should perhaps be discounted slightly, owing to the presence, 
as above mentioned, of a number of students in the selected group 
who did not belong there. But at any rate it is obvious that there is 
no evidence of more effcctiye instruction in the selected group during 
the first semester. The groups during this semester it will be 
remembered proceeded at the usual rate in their study. During the 
second semester, however, when both the selected and the control 
group,? were “pushed,” the correlation between intelligence and final 
mark for the former jumped to 0.76, while those for the two control 
groups were 0.15 and 0,45. It would seem, then, that from the 
standpoint of effective teaching as measured by the criterion laid 
down above, nothing is gained by grouping together students of 
medium intelligence if the instruction proceeds in the usual manner. 
If, however, the students are forced to work harder than the average, 
the individual students come nearer working at their maximum intel- 
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lectual efficiency. Harder work, however, does not have this effect 
with a group of heterogeneous intelligence. 

When the results in the case of the low intelligence group and its 
controls are considered, it is found that they are in strildng contrast 
to those reported for the high and medium intelligence sections and 
their controls. Thus the first semester correlations are uniformly 
high whether the experimental group or its two controls aro considered. 
On the other liand, two of the second semester correlations are rela¬ 
tively low while one is still relatively high. Further, examination of 
the correlations obtained for the low intclligenco group in comparison 
with the correlations for its two controls, shows a similar contrast 
with the results for the high and medium intelligence sections and their 
controls. During the first semester, the low intelligence section 
showed a slightly higher correlation than either of its controls, the 
difference being the more marked in the case of Section D, which is 
more nearly comparable to it in sisse and constituency than Section !F. 
During the second somestor, although there ie a similar difference in 
favor of the low iiitelUgonce group, when compared with Section E, 
which participated in the experiment only during the second semester, 
there is a very great disadvantage when compared with Section F. 

The explanation of these results is somewhat in doubt; but prob¬ 
ably a number of circumstances contributed. In the first place, 
possibly all groups taught by the instructor of the low intelligence 
section, particularly the low intelligence section itself, were forced to 
work more nearly up to maximum intellectual capacity during the 
first semester than during the second. There was no effort on the 
part of the instructor to this end except in the case of the low section, 
which necessarily had to work unusually hard because of the loss of 
time incurred by the readjustment following the regrouping on the 
basis of intelligence, which took place a considerable time after the 
opening of the first semester. To the extent to which the sections 
were in reality “ pushed ” the first semester, an explanation is afforded 
of the relatively high correlations for this semester. There is, however, 
no striking differeuco between the correlations for the selected section 
and those for the controls as a result. In the second place during the 
second semester the subject matter was more abstract and the lecture 
method was employed to a greater extent due to inadequacy of the 
text book. Both these facts might have resulted in an emphasis on 
verbal memorization which might conceivably have a lower correlation 
with intelligence than did general grasp of psychological subject 
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matter cluriug the first semester. In the third place, section F had 
a conscientious and hard working personnel, a number of the class 
being mature school toachci's, so that liigher correlation would be 
expected. 

The results with the low intelligence section and its controls indicate 
that under the conditions of the experiment nothing is gained from 
the standpoint of effective Instruction by grouping together students 
of low intelligence. It ia to be remombered, however, that no effort 
was made to “push” the low intelligence sections or their controls 
as was the cose with the medium and high groups so that the results 
for this section neither coiToborate nor disprove the findings with 
reference to the covering of the content of tho course at a more rapid 
pace. 

The comparisons above described were all between a selected 
section and its control (or controls) tUught by the same instructor. 
A direct comparison of the three selected sections was possible from 
but one standpoint—the common portion of the final examination 
at the end of the first semester, The mean score (number of items 
right minus number wrong) for the high section was 20.1; for the 
medium section 17.1 and for the low section 15.1. The difference 
between the means for high and medium is 3.4 times tho probable 
error of difference; that between the high and low is 4.9 times the 
probable error while that between the medium and low is 2.1 times. 
While the number of items of score in tho common examination is 
small it is interesting to note tho correspondence of these three averages 
with the intelligence of the groups. The difference between the 
extreme groups is certainly significant. 

Summary 

Students in three sections in Elementary Psychology meeting at the 
same hour were redistributed on the basis of tested intelligence. Each 
of the three instructors of these sections taught an additional section 
or two as control. Tho correlation between final course mark and 
intelligence in a given section was taken as a criterion of the effective¬ 
ness of instruction in that section. The crucial point of the study 
was such correlation in a Bcloeted scotion fairly homogeneous in intelli¬ 
gence compared with that inan unselected heterogeneous section taught 
by tho same instructor. 

In tho case of the high section there web nothing gained by the selec¬ 
tion unless this section was forced to cover the material of tho course 
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at a more rapid rate than the average. If the pace was thus increased 
the individual students w'orked more nearly at their maximum, intel¬ 
lectual level. The same result was found with the homogeneous sec¬ 
tion of medium intelligence. With the low section there was likewise 
no apparent advantage in the selection when the rate of progress was 
unchanged, but results bearing upon the other point were not available 
as the low seotiona were not "pushed." 

The principal fact brought out by the study is that division of 
elementary psychology students into sections fairly homogeneous* in 
intelligence is not of itself necessarily an advantage from the stand¬ 
point of efficiency in instruction. In the case of at least high and aver¬ 
age intelligence sections the advantage of such division may be secured 
by covering the content of the course at a more rapid pace. 



A.N ATTEMPT TO DETERMINE ANOTHER ETIOLOGI¬ 
CAL FACTOR OF STUTl'ERING THROUGH 
OBJECTIVE MEASUREMENT 

MAY KIRK SCRIPTURE AND WINIFRED BOYD KITTREDGEi 

Vanderbilt Clinic, Neurological Department, Collcgo of Physicians and Surgeonsi 

Now York 

The ability to communicate with one’s follows by speech ia a 
Icafned reaction, juat as tho ability to communicate by writing words 
is a learned reaction. Man is not born, knowing how to speak, read, 
or write; all of thcao things must be learned. However, men are born 
with varying capacities for these achievements; some may learn to read 
and write more readily than others^ but the point is that these reactions 
to a social environment are an accumulation from the racial heritage 
and must be learned in accordance with tho original nature man 
inherits. 

In an examination of stutterers the proceduro must bo the same as 
in any other investigation of learned reactions. The laws of learning 
are operative in learning to speak in just the same way as they are in 
learning to read, write, or typewrite. Individuals do not learn in some 
inexplicable manner, the learning process follows certain fixed natural 
laws just as a falling body is drawn toward the center of tho earth by 
the law of gravitation. If you would know how an individual learns to 
Speak, then you must examine tho natural laws wliich govern this 
process. These, as before suggested, are the laws of learning. Accord¬ 
ing to Thorndike {Educational Psy<^iologyi Briefer Course, pp. 70-73) 
-there are three primary laws of learning; They are tho laws of 
instinct, exercise, and effect. 

The law of instinct can be readily understood from tho following 
quotations from Tborndike (Educational Psychology, Vol. 1, p. 135): 
“A little child apart from training, makes all sorts of movements of 
the vocal cords and mouth-parts resulting in cooings, babblings, 
yellings, squealings and squawkings of great variety. 

"I repeat that vocalization means, roughly, the responding by 
many different sounds in many different sequences to many different 
external situations, and that from it develop, under training, speech, 
song, and other vocal arts.’* {Ibid,, p. 138.) 

^ This invostigation is tho Urst to bo reported of a sorios wliich tho writers Imvo 
oadertakon; thorofore the conoluBlone mnohed nxe to bo hold fcontativoly until the 
rosoarcli hns beon carried further. 
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Fi’om the above quotations it is plain that the law o£ inatinct means 
in connection with acquiring speech, that an individual inherits a 
tendency to respond at an early age to different situationsby “cooinge, 
babblings, yellinga, aquealinga and squawkings/' These noises are 
man’s potential speech. A child possessing a tendency to make such 
noises must bo taught to change and modify them into articulate 
speech. Thorndike generally calls the law of instinct the law of readi¬ 
ness. The term is suggestive, for man’s instincts arc the moving, 
driving, dynamic forces which keep him “ready” to respond to the 
varying situations of hia environment. 

The law of exercise is of this nature: When an individual is con¬ 
fronted with a situation and makes a response to it, a connection or 
bond is formed between that particular situation and response with 
the result that if this same situation is again presented to the individ¬ 
ual he will react probably with the same response. The more often 
an individual .pxei’ciscs a connection or bond between a given situation 
and response, the stronger will such a connection become. The 
convorso of the law of exercise is also true—the more seldom an 
individual exercises a connection between a given situation and 
response, the weaker will such a connection become. 

The following is a concrete example of the law of exercise in the 
case of an individual learning to speak: A cat walked across the floor 
of a room in which a young child wi^ seated. Upon seeing it he 
stretched out Iiis hands and made a squealing noise. “Cat,” said his 
mother, “cat,” repeated the child. In this instance, the moving 
animal, the mother’s dcsigaation “cat” was the situation, and the 
child’s reply “cat” was the response. In other words the child 
formed a connection or a bond between a situation and a response; 
he modified his instinctive squeal into the articular word “cat.” In 
a manner something like this the child continues to bind together situa¬ 
tion and response in building up a vocabulary. According to the law 
of exercise, the more often a bond is oxercUed the stronger it becomes, 
and conversely, the less often it is exercised the weaker it becomes. 

The third law of learning, the law of effect, is explained this way 
by Thorndike: “To the situation a modifiable connection being made 
by him between an S (situation) and an II (response) and being accom¬ 
panied or followed by a satisfying state of affairs, man responds, 
other things being equal, by an increase in the strength of that con¬ 
nection. To a connection similar, save that an annoying state of 
affairs goes with or follows it, man responds, other things being equal, 
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by a decrease in the Btrongfch of the connection.” (“Educational 
Psychology, Briefer Course,” p. 71.) 

To illustrate the law of effect concretely suppose the mother, in 
the situation above described, smiled approvingly and patted the 
child when he responded “cat,” the response would then have become 
satisfying to him, and other things being equal, ho would tend in the 
future to make the same response to the same situation, Suppose, 
however, the convoi'se were true—instead of smiling and patting the 
child at his response, the mother scowled, becamo angry, and perhaps 
shook the child, his rcspouBC would then have been unsatisfying, and 
other things being equal, he would have tended in the future not to 
make this response to this situation. 

Thorndike concludes his discussion of the laws of learning with this 
concise statement: “These tendencies for connections to grow strong 
by exeieise and satisfying conaequcncca, and to grow weak by disuse 
and annoying consequences, should, if importance were the measure 
of the space to bo allotted to topics, preempt at least half of this inven¬ 
tory. As the features of man’s original equipment whereby all the 
rest of that equipment is modified for use in a complex civilized world, 
they are of universal importance in education. They are the 
effective original forces in what hsw variously been called nature, 
training, learning by experience, or intelligenco.” From this general 
account of tho laws of learning we shall turn to tho specific task of 
finding out how they function in tho stutterer’s acquisition of speech. 

(The data for tho remainder of this discussion were collected from 
a psychological examination of 62 stutterers who wore admitted to 
the speech department of the Vanderbilt Clinic during 1021. The 
Stanford Revision of the Binet-Simon Scale was used.) From an 
examination of tho history of each of these patients it was very clear 
that the stutterer’s difficulty began while he was forming the bonds 
between situations and responses which resulted in his building up 
habits of speech. When admitted to the clinic the median ago of this 
group of 02 individuals was 12 years, 8 months. This does not 
mean, however, that tliey began to stutter at 12 years, 8 months, 
but only that tho difficulty hod developed to such n state that it was 
necessary for thorn to come for treatment. It was difficult to find out 
the exact time of tho onset of the stuttering. Very few of tho patients 
could designate a definite time, and upon questioning relatives of 
those who did designate a time for the onset, there seemed to bo a 
reasonable doubt of its authenticity. By far tho most common answer 
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received was, “I began to stutter when I was a little boy or girl. 
I don't know liow old I was.” 

Kenyon says {" Tho Natui’e and Origin of Stammering ”): “ Certainly 
more than 95 per cont of the cases of stammering develop during the 
period when tho young child is struggling to gain control of the complex 
speech function.” Then later, in the same work, ho says, “It is 
doubtful whether ono in two, or even three, or four hundred cases is 
initiated after the eighth year of life.” Kenyon, however, offers no 
definite data. Prom figures given by Scripture and Glogau on the onset 
of stuttering {Journal of Nervous and Mental Disease, Vol. 42, Jan., 
1916) a median age of 6 years was found in 108 cabes; tho range was 
from “earliest childhood ” to 15 years. No other investigators seem to 
have collected data on tho age of an individual when stuttering begins. 
We can, however, from the data we have, be reasonably assured that 
the onset is quite early and comes on when the individual is forming 
the habits of spoccli. 

With this point settled with some surety, another problem comes 
up; How docs stuttoring begin? What causes it? Many opinions 
have been on tho etiology of stuttering, a few of which are briefly 
summarized: 

Swift (Journal of Abnormal Psychology, 1916-1916) gives his idea of 
what causes stuttering in this manner: “Psychological analysis shows 
stuttering is an absent or weak visualization at the time of speech.” 

Fletcher offers a reason like this: “Stuttering, therefore, seems to 
be essentially a mental phenomenon in tho sense that it is due to and 
dependent upon certain variations in mental state.” (An Experi¬ 
mental Study of Stuttering. Amencan Journal of Psychology, 
April, 1914.) 

Bluemol (“Stammering and Cognate Defects of Speech”), outlines 
his theory this way: “A transient auditory amnesia, the stammerer 
being unable to recall the sound-image of tlie vowel that he wishes to 
enunciate.” 

Scripture (“Stuttei’ing and Lisping,” p. 5) writes: “The most 
frequent cause of stuttering ia a nervous shook. The shock may be 
produced by practical jokes, severe falls and surgical operations.” 

Kenyon defends his theory this way: “The conception here 
presonl/ed finds tho origin of the disorder in relatively light childish 
perversions of tho psyclio-physical processes required for the develop¬ 
ment of tho complex speech function, and places great weight on the 
development of these light beginning perversions. ” 
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Hudson McKuen alatocl that heredity was tho most important fac¬ 
tor in the etiology of stuttering, and this notwithstanding the fact that 
stuttering is an acquired affection, in the sense that speech itself is 
acquired. {The Therapeutic Gazette, June, 1914.) 

Gutzmann agrees that heredity is a very important factor, but he 
considered stuttering more or less a matter of temperament, claiming 
that most stutterers arc excitable and hasty. ("Sprachheilkunde," 
Berlin, 1012, p. 373.) 

Schrank believed that Btultcriug was mostly found among the 
mentally deficient children. ("Das Stotteruhol," Munich, 1877.) 

Liobmann considered nervousness tho real foundation for stutter¬ 
ing. ("Vorlesungcn ubor Sprachatoorungen," Berlin, 1800.) 

Sohmalz thought that a cramped condition of tho vocal cords was a 
primary cause for stuttering. (“Uber Stammeln und Stottern.") 

Wincken held that in all stutterers the will power is bounded by a 
language doubt. (Uber doa Stoltcrn, Thnleund PJcuJcrBZlseh,yo\,Zh) 

Freud and Stoekcl beliovo that stuttering is the outward expressioD 
of an inward mental conflict. (Freud, “Zur l^sychopathologio des 
Alltagaicbous, ” 1904.) (Stcckcl, "Nerveeso Angstzustajnde und ihro 
Behandlung,” Berlin and Woin, 1008.) 

Froesclicl thinks the nucleus of stuttering lies in the psychic con¬ 
dition of the patient who I)ccomcs conscious of tlio ataxically disturbed 
speech movements. ("Lohbuch dor Spvachhoilkundo," Leipzig and 
Wein, 1012.) 

Nadoloczny hold tho oxigoncics of the first few school years as the 
momentous factors of stuttering. ("Dio Spraeho und Stimmstoc- 
rungen in Kindcsaltor," Ixjipzig, 1012.) 

A discussion of these various theories on the etiology of stuttering 
will not be entered into, for the analysis and conclusion of the data 
which immediately follow will show plainly the point of view this work 
seems to warrant. Tlicse theories were cited rather to give a general 
and hasty review of some of the literature on tho subject. This 
investigation is an attempt to determine something about tlic etiology 
of stuttering through objective mea.surcment. Through the Stanford 
Revision of the Binot-Simon Scale an effort was made to answer these 
questions: 

What is tho mental level of tho stuttorer? 

What is the nature of his mental dovclopmcnt, i.c., do his responses 
show an even or scattered development? And finally, lias he a special 
word disability? 
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The median intelligence quotient (IQ) for this group of 62 individ¬ 
uals was 92 per cent. This might be termed a *‘low” average intelli¬ 
gence quotient, since 100 per cent is taken as the average for normal 
intelligence. The variation of IQ's within this group was 66 per cent 


to 130 per cent. 

To show the variation more plainly the following 8 

groups are given: 

Score 66- 70 

8 individuals 


Score 70- 76 

8 individuals 


Bcoro 76-86 

8 individuals 


Score 86- 96 

13 individuals 


Score 06-105 

8 individuals 


Score 106-110 

6 individvials 


Score 110-120 

6 Individuals 


Score 120-130 

6 individuals 


Total, 8 groups 

02 individuals 


While it is true that this evidence warrants the conclusion that these 
stutterers tended on the whole, to be of low normal intelligence, the 
range of intelligence must not be overlooked. For instance, in this 
group there were 8 feeble-minded individuals, 8 borderline cases, 8 
very low normal, 13 low normal, 8 normal, 6 above normal, 6 superior, 
and 6 very superior. It is obvious from the variation within this 
group, that it is unwise ever to form an opinion on the basis of a 
median measurement alone; it is the part of wisdom, however, to 
suspend judgment until the range and groupings within the range are 
known. The conclusion could not be drawn, from these data that 
stutterers on the whole, have low normal intelligence, for as stated 
previously the coses were comparatively few and selected. Had the 
mental examination been made on a like number of cases in the dia¬ 
betic or tuberculosis wards, perhaps the same median IQ would have 
been found. For it has been shown repeatedly by mental examination 
that on the whole individuals who seek help from charitable institu-- 
tions tend to have low normal median intelligence quotients. Terman 
through objective measurements on different social classes proved this 
(“Measurement of Intelligence,” p. 72), Therefore, before we could 
conclude that stutterers on tlie whole have low noi’uml intelligence, we 
would have to have a gi'cater number of cases, and the cases would 
have to be di-awn fi’om various social classes. 

An attempt was made to answer the second question as to whether 
a stutterer has an even, or scattered mental development. When 
Binet constructed his scale it was on the assumption that the intelli- 
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gcnce grows in tho same gradual way that the rest of the physical 
organism does. Tho regularity or evenness of physical development 
has been determined by an actual measurement of many children at 
different ages. By a cojnparison of these norms with the height or 
weight of a child it is possible to tell whether his physical development 
has been normal. If it has been, his height and weight will agree 
closely with the established norms. 

Binet, his followers, and especially Terman who has standardized 
this scale for use in the United States, have found that intelligence 
does develqp gradually and probably ceases at about the time the 
skeleton ceases to develop. When a child is found who measures 
much above the average mental level for his ago, he is designated os 
superior in intelligence, just as a child superior to his norm for physical 
development is described as physically superior. While this oven 
physical or mental development above one’s norm is generally a 
superior manifestation, an uneven development (i.e,, markedly above 
one’s norm in certain measurements and markedly below in others) is 
symptomatic. For example, suppose a child is 8 years old, and his 
height is that of a 12-year-old child and his weight that of a child 6 
years old; or suppose again a child is 7 years old, has the weight of a 
9-year-old child and tho height of a 6-year-old child, these individuals 
would be matters of great concern and means would bo taken to find 
out tho cause of this uneven physical development. 

As an illustration of uneven mental development, take tho cose of a 
boy 14 years old who recently came to this clinic for mental examina¬ 
tion. He had reached the fifth grade by the ago of 12; ho failed on 
some of the tests at the third year level and had scattering successes 
up to the fourteenth year. This was a case of juvenile paresis and dis¬ 
ease had caused a definite organic deterioration of tho nervous tissue. 

If no organic basis can be found for a scattering performance, the 
conclusion must bo drawn that the disease is a functional ono and the 
treatment falls within the scope of educational therapy. 

An individual with an uneven mental development, and conse¬ 
quently a poorly integrated nervous mechanism, is liable to emotional 
"upsets;” he is unstable, and his responses to tho situations of the 
environment arc liable to be of a bizarre nature. As a result of this, 
in time, such an individual often develops character dofocts of an 
anti-social nature. 

This evenness of mental development seemed an important thing 
to investigate in tho case of the stutterer. For the emotional 
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“up-sets” and the unstable behavior in general of the stutterer are 
phenomena familiar to all who have observed him. 

To determine the evenness of mental development in the 62 stut¬ 
terers upon whose record this report is based, the difference between 
the basal age and the upper limit was found for each case and then the 
median of those two taken. The basal age is the point at which the 
subject passes all the tests, and the upper limit is the point whore he 
fails in all of the testa. Due to the fact that some of these individuals 
were over 16 years old and others hod superior intelligence, there were 
some successes at the 18-year level which is the last scale in the Stan¬ 
ford Kovision; consequently, the spread for these individuals is not a 
correct one, since their upper limit was not determined. However, 
there were only 13 such cases and they have been counted as though 
their upper limit were 18 years. The median spread in years for the 
group from the point whore all tasts were passed to the place where 
none were passed, is 6 years. In other words, after finding a place 
where an individual could pass all the tests, it was necessary to con¬ 
tinue through the scales for 6 additional years to find a point where he 
could pass no test at all. The groupings for this spread are as follows: 


Nuubkr or Gabkb Spiigad 

3 2 

7 3 

16 4 

7 5 

0 6 

12 7 

8 8 

1 0 


Total, 62 

In giving this test to individuids of average intelligence or to those 
of inferior, or superior intelligence and an even development, it is 
generally necessary to proceed 3, or at tlie most 4, years above the 
basal age to find tlic upper limit of their aehicveinont. Terman writ¬ 
ing on this subject of scattering successes says: 

"Why, it’raay bo asked, should not a child who has lO-year intelli¬ 
gence answer correctly all the tests up to and including Group X, and 
fail on all the testa beyond? There are two reasons why such is almost 
never the case. In the first place, the inteiligonco of an individual is 
ordinarily not evon. There are many different kinds of intelligence 
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and in some of these the subject ifl better endowed than in others. A 
second reason lies in the fact that no test can be purely and simply 
a test of native intelligence. Given a certain degree of intelligence, 
accidents of experience and training bring it about that this intelli¬ 
gence will work more successfully with some kinds of material than 
with others. For both of these reasons there results a scattering of 
successes and failures over throe or four years." 

Although an uneven intellectual development of 3 or 4 years 
above the basal age is a natural condition, a matter of 6, 0, 7, or 
more years above the basal age is sufficiently symptomatic to warrant 
an investigation. In the cose of these 62 stutterers if we take a 
spread of 3 years above the basal age as a normal condition, a 
spread of 4 years as a borderline condition, and a spread of 6 years 
or more as indicating an abnormal uneven intellectual development, 
wo find the following groupings: 

10 with a normal uneven development 
15 with an imovon development of 4 years 
37 witli an abnormal uneven development. 

62 total 

Here again is found an overlapping of results. This is always the 
case whenever any objective measurement is made of a comparatively 
random sampling of individuals. It is this constant overlapping of 
results which keeps an investigator from becoming dogmatic about his 
findings. But there is something more signifioant thau maintaining 
the equilibrium of an investigator in this overlapping of objective 
measurements, and that is the continuity of any measurable trait, 
group of traits, or physical characteristic between any reasonably 
unselected group of individuals. No one group of individuals can be 
set apart in sharp contra-distinotion to another; nature does not work 
that way. 

In the case of stutterers while it is true that on the whole they show 
an uneven intellectual development, and that this condition is gener¬ 
ally accompanied by emotional upsete, bizarre behavior, and later 
character defects, the fact that there has been a small overlapping of 
results must not bo lost sight of. Successful therapy has always 
recognized the emotional cU8ttu*banco of the stutterer. 

This leaves the third question to be disposed of—as to whether 
the stutterer has a special word disability? The vocabulary tost of the 



171 


Eliological Factors of Stuttering 

Stanford scale begins at the eighth year. Obviously, then, the 
individuals under 8 years who had no superior intelligence could not 
be examined by tlio vocabulary test. Of these 60 there were individ¬ 
uals—ages 9, 10, and 11 years chronologically—who failed to pass 
the 8-year vocabulary test. It was thought best, however, for pur¬ 
poses of weighting to score them os having passed the 8-year test. 

By the elimination of the very young stutterers, the median 
chronological age for tliis group was 13 years, 6 months. The median 
mental age for the group was 11 yeara, 9 months. Thus there is a 
difference between the chronological age and the mental age o£ 1 year, 
7 months, while the difference between the chronological and the 
vocabulary age is 2 years, 7 months, a difference just twice as great. 
The difference between the mental age and the vocabulary age is 
1 year. Clearly then, on the wholej the stutterer has a word disability, 
and of such a nature that he is unable to overcome it in proportion to 
his intelligence. An examination of the scores showed this grouping. 
There were 42 individuals whose vocabulary age was below both the 
chronological and mental age. Then there were 6 individuals whose 
vocabulary age was above the chronological age yet below the mental 
age. These were individuals of very superior intolligence, we may 
perhaps assume from this that the word disability was so severe that 
it could not be overcome in spite of the superior intelligence. Unfor¬ 
tunately, the history could not be gotten of 1 case, the only one whose 
mental age and vocabulary age were equal, yet both below the chron¬ 
ological age. The marked variation found in the vocabulary measure¬ 
ment, was the ease of the following two stiitiercr.s: Their vocabulary 
ago was above both the mental and chronological ago. Both of the 
mental ages, however, wore above the chronological. One case was 
of a young child who played constantly with an older brother who 
stuttered. The father brought him to the clinic and explained that 
he wanted to “break him of the habit before it got worse.” The other 
case was a boy in high school and when asked how he happened to 
know 80 many words said, “I study the dictionary all the time, and 
when I read I write down all the words I don’t know and tlien look 
them up and try to use them. I get stuck when I try to talk so I 
thought if I knew lots of words I could always think of one when I 
felt I was going to get baulked." This boy has superior intelligence; 
he has sensed his own difficulty and had set out undirected to correct it. 
The conclusion to the third question seems warranted that stutterers 
do have a real word disability. The writers hope through persistent 
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veecaroli to Icavn objeotivoly something a-bout the chavaclcr of this 
word disability. 

At the beginning o£ this work the thesis was put forward that speoch 
is a learned reaction, and that the process of learning to talk follows 
the laws of loaruing. For this reason acquiring epeocli is a develop¬ 
mental process and begins early in life. By objective meosuromont 
it has been shown that these stutterers had a low normal intelligence 
quotient, that they had an uneven mental development, and a word 
disability. 

The next problem to face is the one of etiology* If tho laws of 
learning are recalled in this connection some additional light may bo 
had. Man, to summarize the. law, learns in accordance with his 
native capacity; ho forme a bond between a situation and response; 
this bond must be exercised and acepmpamed with satisfaction if tho 
habit be permanently formed. In the caso of tho 62 stutterers what 
will this mean? Since on the whole they had low normal intelligence, 
the bonds wore formed more slowly and with greater difficulty than 
by an individual of normal or superior intelligonco. Theso stutterers 
in forming speech habits wore first handicapped by intelligence. In 
the next place, by an uneven intellectual development, this condition 
is generally accompanied by an unstable emotional and mental condi¬ 
tion. Tho terms payolionourofcic, neurotic, psychopathic, hysteric, 
and constitutionally inferior, are descriptive of this kind of an intel¬ 
lectual development. Those individuals generally lack porsevcrance 
and endurance; furthermore, they arc liable to emotional upsets 
under trying and difficult oircumstanoes. Finally, tho stutterer has a 
word disability. The difficulty tho stutterer encounters when ho 
tries to learn to talk is now completely apparent. In the first place 
he forms the bonds involved in acquiring speech slowly and with 
difficulty. Moreover, in addition to this ho htw a word disability 
which makes it much more difficult to connect tho proper response 
with the appropriate situation. It is plain to see that satisfaction 
could not follow tho formation of a speech bond under the oircum- 
stances, and unless satisfaction follows tho formation of a bond, it is 
not permanently formed. The picture of tho stutterer is not com¬ 
pleted, as tragic as it may now seem, for with this condition must be 
reckoned tho final fact that he possesses an uneven intellectual develop¬ 
ment with its aooompanimont of emotional disturbance. Obviously 
an individual with thoso limitations could not assiduously and with 
satisfaction apply himself to forming the bonds involved in the acquisi- 
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tion of speech. An individual does not know why he stutters and halts 
when ho tries to say a word. This condition came on unconsciously 
and in early life when tlic speech bonds were being formed. A stut¬ 
terer could no more explain lug condition than a shell-shocked 
individual could his; in both cases the functioning had been below the 
level of conBciousnosB. 

This investigation, however, up to the present is not sufficient to 
justify a theory on the inception of stuttering. Before this could 
be given, further work must be done on the exact nature of the word 
disability, and tests must be given to get definite evidence of the bond 
forming difficulties. Yet if the results of these objective measurements 
as they stand can suggest more helpful methods for the treatment of 
stutterers, the purpose of the work will be fulfilled, 



THE EFFECT OF THE STUDY OF LATIN ON THE 
ABILITY TO DEFINE WORDS^ 

{Conclnded from (he Nooemher Issue) 

A. R GIUilLAND 
Lafnyotto CoHogo 

It wae thought that a study of Froneh and German might improve 
the ability of the student to define the given list of words. In order 
to discover any important effect of these languages, the scores of the 
students in each of the three largest groups—those with no Latin, 
those with 2 years and those with 6 years of Latin—were each arranged 
on the basis of the years of French the subject had taken. They were 
also arranged again on the basis of the years of German the subject 
had taken. While this method of determining the effects of the study 
oi these languages is not conclusive proof the results tend to show 
that they do not improve ability to define words appreciably more than 
the study of other subjects of tho curriculurn. 

Most investigations have shown that tho A.B. student or the 
student who has solcotcd Latin is superior in native ability to tho B.S. 
student who has studied no foreign language or at least no modern 
language. The present study presents no exception to this rule. 
Tho average intelligence score on Army Alpha, cloasified on the basis 
of the number of years spent in tho study of Latin, is given in Table 11. 

Tai}i,b II. —^Ybars of Latin 

0 2 3 4 6 

Array Alpha Sooto. 140 UO 168.2 166.Q 103.1 

AD. 20.4 14.4 11.3 22,7 17.2 

The average score for the class from which tlieso groups were chosen 
was about 162. Therefore tho groups chosen were representative of 
the intelligence of the class. Graph 11 provides a means of comparing 
the relative improveincut in ability to define words of Latin origin 
with the intelligence score for each of tho 6 groups studied. These 
graphs are constructed on tho basis of the percentage of tho possible 
total score on each test and hcnco since they are constructed on the 
basis of common units they are directly comparable. For example tho 


^ By a miatakc in tho oditoTiaJ offioo ttioao pages wero omitted from Dr. Gilli¬ 
land’s article in the November, 1922, issue, 
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no Latin group averaged 17.3 points credit out of a possible 60. That 
is, an average of 27.2 per cent correct responses. The same group 
averaged 149 points out of a possible 212 on Army Alpha, or 70.3 per 
cent of tlm total possible score. Wliilo the absolute standing of each 
group is not necessarily comparable—even though it might seem 
significant that none of the Latin scores are as high as the intelligence 
scores—it does give a means of relative comparison of the different 
groups. From a study of the graph it seems apparent that native 




Showing tho scores on tbo test in 
defining words of Latin, Anglo>Saxon 
and Greek origin. 


Showing tho relative standing of 
grou^ With different amounts of train¬ 
ing in Latin in iDtolIi^nce, obllego 
standing and score in defining words 
of Latin origin. 


ability does not account for even the larger part of tho improvement in 
ability to define words of Latin origin. It may be that a smaller 
percentage of difference in intelligence score is more significant than a 
similar difference in score in defining words. No data are available on 
either side of this question. It seems highly improbable that this 
difference is at least groat enough to account for oven the major part 
of the difference found between tho two scores. What has been said 
of the words of Latin origin holds true to a lesser degree for words of 
Anglo-Saxon and Grcelc origin, Surely, the study of Latin has a real 
value in improving ability to define words. 
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Table III gives the average college standing for the first semester 
of the freshman year for each of the 6 groups. 

Table III.—Years o? Latin 

0 2 3 4 6 

College standing. 1.82 1.82 1.04 2.03 2,43 

AD.68 .06 .70 .75 .70 

The highest possible college standing nt Dartmouth College, where 
this study was made,—straight gives 4 points credit. 5’a give 

3 points, C’s give 2 points, and D's give 1 point. The average standing 
for the class from which the groups wore selected was approximately 
2.00 points. The average for the 97 cases of the 6 groups was 1.98. 
They were therefore representative in scholarship as measured by first 
semester standing. The averages for each separate group have been 
reduced to percentage of the highest possible score and placed on Graph 
II making the college standing comparable with the intelHgonoe 
score and the score on the test in defining words of Latin origin. 

We find a much higher correlation between tho college standing 
and the ability to define words of Latin derivation than between intel¬ 
ligence score and ability to define words of Latin derivation. In 
fact these results present very strong evidence for the reasonable 
assumption that one of tho major differences between the groups is the 
greater application to study in the case of those spending a longer 
time on Latin. This application results in better grades, in general, 
and a broader knowledge of tho meaning of words. This study does 
not prove whether tho time spent in studying Latin might not better 
be spent in the study of the derivation and the meaning of words. 
Suffice it to say, too often the men who do not study Latin do not get 
such training in the study of words. The men spending 4 or 6 years 
in Latin may have more original ability in the subject. They have a 
slightly higher native intelligence. They also gain in ability to 
define words through a definite knowledge of the meaning of many 
words of Latin origin and through a better general understanding of 
how to analyze unfamiliar words. But above all the men spending 
several years in Latin are the men who take their college course most 
seriously and make good grades. 





the determination of ability for learning 

TYPEWRITING 

W. W. TUTTLE 

Principal of Univeraity High School, University of South Dakota 

Vocational guidance haa had a long history. In former times men 
attempted to control their objective environment or to determine their 
personal qualifications by magic, oraolca, etc. Only within the last 
few years haa there been a real ecieutific approach made to the subject. 
Even now men are tried out in the actual field in which they wish to 
work without any preliminary determinations concerning their ability. 
In many coses this method of selection ends in dissatisfaction on the 
part of all concerned. 

There is scarcely any doubt that much time is wasted by students 
who train in fields in which they do not have the capacity to work to 
the best advantage. If it is possible for the psychologist to guide 
individuals into tho vocation for which they are moat talented or keep 
them out of the vocation for which they have but little talent, he can 
render to society service of ineatimablo value. 

Tiib Problem Stated 

There are numerous more or less isolated problems connected with 
tho work of vocational guidance and each one has its own particular 
line of approach. Tho problem which is being investigated by the 
writer is the possibility of determining in advance ability for learning 
typewriting. The problem presents iteelf in this form: Do those who 
become adepts in typewriting poMess certain types of native ability 
and can this native ability be measured? On the other hand, do those 
who are not able to acquire proficiency in typewriting show, when 
tested, tho absence of these types of native ability? In this study 
certain assumptions relative to these questions have been made and 
tested. 

The following are, for the purpose of this study, considered to be 
the essential native traits of an effudenl typist: 

1. Quick motor action. (2) A keen sense of rhythm. (3) Ability 
to pay attention and be accurate. (4) A well developed memory span. 
(5) Ability to follow directions. (6) Ability to carry on the process of 
substitution. 

Metliods have already been devised for measuring these special 
trails. It was found necessary, however, to devise new ones and to 
vary those already in use, in order to get tests suitable for this experi- 
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ment. Group tests are used in all but one instance, but in this case 
the test is so easy to aclminister and ehcck that it could be economically 
used in large cIobscb. In all tests used tho scoring is simple and can be 
done accurately. 

A class of 20 students beginning the study of typewriting forms 
the basis of this experiment. 

In order to dotermino tho accumey of this test for predicting type¬ 
writing ability, the results arc compared with the typewriting grades 
made by the group at tho end of the first somestov. 

The Tbbts Used 

A description of the tests used, methods of administration and 
checking of results follows: 

1. Quick Motor Action. —Tho object of this teat is to dolonnino one's ability to 
control tho movomonts of tho arms and bands. Tho tost consists of tapping any 
koy on tho koyboard of a typewriter with any finger, as rapidly ns possible. Tho 
number of the taps made by tho subjeot was shown on tho oarriago eoalo. Bach 
subject was given 5 trials, 6 seconds in duration, with oaoh hand. 

2. A Keen Sense of Jihythm. —This was measured by Soashoro’s tost for sense 
of time. The tost consists of 100 combiaationB of 3 clicks, marking off two inter¬ 
vals of time of diftoront length. Theso combinations have boon recorded upon a 
phonograph record so that they can bo reproduced before tho group. 

3. Attention and Accuracy Test, —This tost is divided into two parts. Tho 
first part consists of 0 horizontal linos of 61 figures oaoh in which occur 100 com* 
binations of two consecutive figures whose sum is nine. A copy of tho test was 
given to oaoh member of tho group with instructions to underline all oombinations 
of oonseeutivo figures whoso sum is nine. Tho time allowed was C minutes. 

The second part of this tost consists of 11 horizontal lines of 57 letters each, in 
which occur 100 oombinations of x nod n. A copy of the tost was given to oaoh 
member of the group with instruotions to undorliue all adjaoont combinations of 
cQ and n. The time allowed woa 3 minutes. 

4. Memory Syan Test. —This test is raodo up of two parts. Tho first part con¬ 
sists of 0 horizontal linos with a total of 46 abstract words. Tho first lino contains 
5 words, each successive lino incroasing one word, making 10 words in lino six. 
The words arc road to the group, a line at a time, with an interval of 1 minute 
between readings. At the end of cooh l-minute interval the group is instructed 
to write the lines of words as nearly as possible in tlm order in which they were road. 

The second part flonsIstB of 0 hoiizontal lines of 46 concrete words. Tho linos 
are constructed the same as in Fort I. Tho total time allowed for Fart 1 was 
S^minutes and for Part II, 3 minutes. 

5. Ability to Follou) Directions, —This tost involves four elements, oaoh sotting 
forth very definitely certain acts which ate to bo poeformod. Tho time allowed 
for this test was 10 minutes. 

6. Ability to Carry on the Process of SvbslUiUion. —Tlio substitution tost oonsists 
of 9 oombinations, each of which is composed of one of tho digits, and a symbol 
which ropTosonts it. For example, 1(, 27, 3j|t, etc. Each student in the group is 
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given n key oerd containing the combinations of figures and symbols and another 
card containing only syinbola. Those symbols wore arranged in 26 equal lines. 
The students were inaLruclcd lo write opposilo the symbols the corresponding 
figures, using tho key as a guide. The time allowed was 6 minutes, 

1. Methods of Scoring. (1) Motor Action Test.—Tho average number of taps 
made by tho student in fi seconds as indioatod by carriage marker on tho type¬ 
writer was used os tho score. The highest score was 43.0 and the lowest was 28,4. 

2. Sense of Rhythm Test. —In this test, one point was given for onoh oorreob 
response. Tho highest posaiblo score is 100, Tho rango in this test was from 64 
to 80. 

3. AUenlion and Accnracy Tesi.-—One point ^yna given for each combination 
correctly underlined. Tho highest possible score is 100. The range in this experi¬ 
ment was from 80 to 100 for tho first part and 06 to 100 for tho second. 

4. Memory Span Teal. —Two points wore allowed for each word correctly repro¬ 
duced in right order, and ona point for oaoh word correctly reproduced but in 
wrong order. Tlio highest poseiblo score is 96, The rango in this experiment 
was from 36 to 71 for Part I and from 49 to 73 for Part II. 

6, Ahilily to Follow Directions. —This teat contains four ciomonts, For tho first, 
10 points are given; for tho second, 6; for tho third, 6; and for the fourth 26. Tho 
time allowed was 10 miiuitos. Tho highest possible score is 46. Tlio range in 
this exporitnont was 13 to 46. 

6. Ahilily lo Carry on the Process©/ Snbstilulion, —Four points were allowed for 
each lino correctly substituted. Tho highest possible score is 100. Tho range in 
this experiment was from 00 to 100. 

Tho total scoro for all tests used la found by taking their average. 

After the tests described above were given and the results checked, 
a typewriting teat was adimnistcrcd. In order to determine whether 
the marks made by tho students in the above mentioned tests were 
indicative of tlie results of tlio typewriting test, the two sets of marks 
were correlated. The following are tho results: 

CosmoiBHT or 
ConitBLATIOK 

1. Motor notion and typewriting lest.64 

2, fienso of rhythm and typewriting test..10 

3. Attcjition nnd accuracy test and typowriting test: 

Part 1.41 

Part II.68 

4, Memory span and typewriting test: 

Part I.negative.30 

Part 11.negative.11 

.6. Ability to follow directions and typowriliug test.17 

0. Substitution lest nnd typewriting test.6^ 

Tho coo0icionts of correlation eliow that seuso of rhythm end 
ability to follow dirootions are of but little signilicanco in indicating 
ability to learn typewriting. Memory span is shown by the coef¬ 
ficient of correlation to have no direct relation to ability to learn type- 
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writing. The coefficients of correlation indicate, however, that motor 
control, ability to pay attention and to bo accurate, and ability to 


TABliB I 



Student 

Typewriting 

Student 

Total score 


number 

teat grftde 

number 

in this study 


1 

00.4 

1 

1 

84,4 


2 

07.2 

8 

81.4 

Qroup I. 

3 

00.0 

0 

81.2 


4 

66.7 

2 

80.4 


5 

05.0 

4 

80.0 

■ 

8 

64.6 

10 1 

79.3 


7 

03.1 

n 

70.1 


8 

02.0 

10 

78.8 


0 

02.4 

0 

78.0 

Group II. 

10 

01.8 i 

13 

77.8 


11 

61.7 

7 

77.7 


12 

01.4 

8 

77.0 

, 

13 

01.3 

20 

76.0 


14 

00.7 

8 

74.7 


15 

60.4 

IS 

74.4 



80.0 

17 

72.8 

Qroup HI. 

17 

60.0 

16 

71.6 


18 

80.6 

12 

71.3 



58.0 

14 

04.0 

* 

j 20 

49.8 

10 

62.5 


do substitution arc indicative of capacity to become efficient in 
typewriting. 

Since the teats show that the capacity to follow directions, memory 
span and sense of rhythm are of no consequence in the learning of type¬ 
writing, they are not conBidcred further. 

In order to determine the relation which exists between the tests 
for motor action, ability to pay attention and bo accurate, 
ability to do substitution and the actual accompli^hmonts in type¬ 
writing, the total score for these tests was correlated with the scores of 
the typewriting test. The ooefficieniof correlation between the scores 
for the typewriting test and the scores of the above mentioned tests 
was found to bo 0.C21. 
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Table I further shows the relation between the data which were 
correlated, 


Conclusions 


1. The data indicate that certain native abilities are closely- 
related to capacity for learning typewriting. The students who receive 
high grades in the above mentioned tests also receive high grades in the 
typewriting test. 

2. The grades made in the above mentioned testa and the type¬ 
writing test grades when compared according to rank were placed into 
three groups as indicated in Table I and Table II. 


TrptwniTiNa 

Tear 

Grouj) I 06-70 
Group ir 01-06 
Group III4&-61 


Table II 

Tsst Used ik Tuts 
Study * 

Group I 80-85 
Group 11 75-80 
Group HI 60-76 


In Group I, Table I, 5 students wore placed by the typewriting 
tost. Of those 5, 4 wore found in Group I, of the above mentioned 
tests. The remaining one was found in Group 11, of the above men¬ 
tioned tests. 

In Group II, Table I, 8 students were placed. Six out of this 
number wore placed in Group 11, of the above mentioned tests. Of 
the remainder of those placed in Group 11, by the typewriting test, 
both falls in Group III of the above mentioned tests. 

In Group III, Table I, 7 students are placed by thetypewritingtest. 
Of these 7, 6 are found in Group III of the above mentioned tests. Of 
those remaining in Group III of the typewriting test, 1 is found in 
Group I, and 1 in Group II, of the above mentioned tests. 

3. The correlation between the scores as indicated by the type¬ 
writing test and the above mentioned tests is sufficiently high to 
indicate a strong relation between capacity to learn typewriting and 
the tendencies tested. This is also shown by Table 11. 

4. The results of the above mentioned tests when compared with 
the results of the typewriting test indicate that those making a mark 
of 80 or above in the above mentioned tests will become excellent 
typists; those making from 75 to 70 inclusive, will be but average 
typists, while those making bolow 75 will either be poor typists or 
fail entirely. 

The rather limited scope of this study scarcely justifies final con¬ 
clusions but the results indicate beyond any question that the possi¬ 
bilities along this line of study are great. 



NOTES ON ARTICLES IN EDUCATIONAL 
PSYCHOLOGY IN CURRENT ISSUES OF 
OTHER MAGAZINES 


llEl'OETED BY CECILE COliOTON 
Departmgnt of Educational Psycliology, Tho Lincoln School of Teachers College 

Intblligsncb Tests 

Mmlal YbrdBlichs in Question, 0. H. Mathes. Education, 1023, February, 
342-361. Anothor queation aa to the TGliabiliiy of the army tests. Disousaog 
the need for recognizing other important qualities in addition to native intelligence. 

Tho Psychological Tost Versus the TeachePs Judgment, Charles H. Sampson. 
Educational Uoview, 1023, January, 16-17. Shows why tho psyohologicfil exom- 
ination is of more value than the tcachc9‘’8 judgment. 

The Tntelligcnco of Pujrih Who Repeal. H. T, Eaton. School and Society, 
1923, February 3, 130-140. A study of ropoatera in apocial clnssos shows that 
only 51 per cent of the failures ftro dotonnined by lack of mental ability. 

Leadership in Relation to InteUigenee. H. S. Bennott and B. 11. Jonea. The 
School Review, 1023, January, 126-128. A study of 20 pupils attending the 
Koobester Shop School seems to indicato that low intelligence bars a person from 
loadoiahip. Intolligoneo was rated by OUs Qionp Intolhgonco Scale, Estimate 
of ability as leaders was the combined judgment of tho instructors, the principal, 
and tho athletic director. Six individual ct^s aro described. 

Intelligence of Teachers in Training — Sffecis vHlh Intelligence Tests. Samuel 
Renshaw. Journal of Educational Resonrch, 1923, January, 28-30. Summarizes 
the evidence on tho intelUKcnco of normal school students. Gives now data on 
, 1190 students in the second largest of Miohigfln's normal schools tested with 
Form 6 of Army Alpha. Shows practice effect of test-taking. 

EduoationaIi Tests 

An Oinnihus Achievement Test for tte Upper- Slemenlary Orades. C. W. Odell. 
Tho Elementary School Journal, 1923, January, 353-368. Discusses the results 
of a aeries of 16 true-false exoreises in geography, history, elomontary soienco, 
grammar, and certain phases of arlthmotio based on tho Chicago course of study. 

The Injluonce of Firsl'year Latin upon Range in English Vocabulary. Professor 
E. L. Thonidiho. School and Soeioty, 1923, January 20, 83-84. The Thonulike 
Tost of Word Knowledge admimsterod to all first-year pupils in 60 high sohools, 
throe times during tho school year, shows ft slight auporiority of the Latin group, 
Four tables give dottuled data. 

The Influence of First-year Latin upon AbRily to Read English, Professor E. L. 
Thorncliko. School and Society, 1023, February 10, lOG-108, A Comparison of 
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tho scores of Latin and non-Latin groupa on tho Thorndike-MoCall Reading Scale, 
night tables present tho data. Data inaufliGiont for reliable conclusions. 

Knouifedpe 0/ Engluh amng Wbrmot Scftool Students. Lawicnco Augustus 
AvcHll. School and Society, 1023, January 13, 63-60. Results of a set of ques¬ 
tions given to 463 young women juat entering normal schools. Answers to each 
question arc tabulated and reported in detail. 

Does Kttowlfldflfl 0/ Pomiai Grammar Pmdmf William Asker. School and 
Society, 1023. January 27, lOft-UL Tcala given to 296 freshmen h tho Dm- 
versity of Wisconsin show that tho study of formal grammar has little influence 
on ability to judge tho grammatical correotneea of a sentence or ability in English 
oomposltion. 

iSMu 0/ Diplh and Rate 0/ ComprclicnMon in Readinp hy Means 0/ a ?raclm 
Experimenl, Arthur 1. Gates. Journal of Educational Research, 1923, January, 
37-48. Describes an experiment in which tho Burgess tost was used 6 days, 5 
minutes per day and tlio Tliorndiko-McCall for 30 minutes per day for 7 consecu' 
tlve school days. Thorndiko-MoCall is more rcliablo than tho Burgess. Various 
forms are not of equal dif&culty. 


Miscbllaheocs 

The iStiperior Child in Our SchooU. Clara Harrison Town. Educational 
Iteviow, 1923, January, 17-21. Discussion of tho problems involved in choosing 
superior ohiidrou, planning a program for tbojU; and carrying out this program. 

The Slandardmlm of Te^ls and Scales. Peter Sandiford. Journal of Educa¬ 
tional Research, 1923, January, 14-20. A discussion of the problems involved in 
the standardization of tests shows need of r<^tandardization of American tests for 
Canadian schools. Eight tables give detailed data. 

Prtnciptfls Underlying Grading. Florcnco Y. Humphries. The English Joui- 
aal, 1923, January, 33-38. Argues for a standardization of marks by (1) deter¬ 
mining a minimum roquiromont and (2) agreeing on 0 definite interpretation of each 
mark above the minimum. 

Amy in Eitantinalimw. Elbiidga Colby. Educational Review, 

1023, January, 7-0. Dcsoriboa types of special oxaminations used at the Infantry 
School at Fort Bcnning, Georgia. 

The Wiii to Learn. William F. Book and Leo Norvell, Pedagogical Seminary 
1922, December, 305-302. “An Experimental Study of Incentives in Loaming,” 
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OONDBOTBD BY lAURA ZIBBES' 

1, la Die IntelUgencG of Ike American Nation on iho Beeline ?—The 
intclligencQ examination of over a million men in tlio array during the 
recent war was a feat of which American psychologists may well be 
proud. The significance of the results of the tests is great. Many 
interpretations, scientific and popular, havo been essayed. The final 
verdiot as to their real meaning has not yet boon made. But here 
comes a psychologist who has studied ono aspect of the intelligence 
scores thoroughly and carefully, and who draws far-reaching and 
important conclusions from this study.* In brief, ho concludes that 
the intelligence of the people of this country has been declining for the 
past 60 years or more. This is duo to tho two factors of immigration 
and race mixture. Tho Nordic stock is more intelligent than the 
Alpine or Mediterranean, and our immigration has progressively 
increased the proportion of the two latter racos. The race mixture 
now going on tends to lower the average of tho whole nation, not alone 
by mixture of those European stocks but also by a still more insidious 
mixture with the negro race. 

This thesis is carefully worked up to by a logical and careful analy¬ 
sis of the results of tho array tests. The native draft is compared with 
the foreign-born. The foreign-born men are further divided into the 
various countries of origin and the differences between tho average 
scores of these groups presented. Most important for the author’s 
thesis is the negative correlation between intelligence and length of 
residence in this country. This decrease in score is not due to the 
handicap of language os one might at first bo iriclincd to conclude. It 
would seem to be a real decline in intelligence. The nationality groups 
that fall lowest are Russia, Italy and Poland. A largo percentage of 
recent immigration has been from these countries. 

> Unsigned reviews which appear in this dopartmont are prepared by Laura 
Zirbes. 

•Brigham, 0. C.' “A Study of Amorioan Intolligonco." Princeton UnWoreity 
Press, Princeton, 1923, p. 210. 
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The most no-vcl part of the study comes in the author’s attempt to 
estimate the relative proportions of Nordic, Alpino and Mediterranean 
blood in the various countries of Europe, and then to calculate the 
proportions of each of these throe racial groups among the immigrants. 
How accurate these estimates are it is impossible for the reader to 
judge. The author is frank to acknowledge the difficulties involved 
and he tells us that if wo do not like his figures, we may make our own 
estimates and re-calculate the figures for the immigrants. But even 
suppose we do accept the author's estimates, we may yet object to his 
implications in Section X, where he jumps from the calculated intelli¬ 
gence of the Nordic, Alpino and Mediterranean races as represented 
in this country to the conclusion that these differences in intelligence 
also hold for the three races in Europe. Hence the superiority of the 
Nordic stock. But this assumes that immigration has been absolutely 
non-selective, that the representatives of the three racial groups in 
this country are random samplings of the racial groups in Europe, and 
this IB certainly open to queBtion. Nevertheless, this assumption 
does not vitiate the main argument of the study. 

Whether we are prepared to agree with all of Dr. Brigham's state¬ 
ments or not, we shall certainly bo in hearty agreement with him when 
he demands a more selective policy for future immigration and a more 
vigorous method of dealing with the defective strains already in 
this oountry. Dr. Brigham has presented clearly and logically an 
immensely interesting topic. His bookshould be read not only by pro¬ 
fessional psychologists, sociologists and educators, but by all thoughtful 
men and women who have the future of this country at heart. 

R. P. 


2. A Psychological Presentalion of a Course in Psychology. ’—Surely 
we should expect to find psychological principles applied in the organi¬ 
zation of a course in psychology and in a textbook presenting materials 
for such a course, if indeed such principles are applied in any field. 
In the preface of this book is a list of generally accepted principles 
fundamental to good teaching. In the following pages are 23 lessons, 
not chapters, embodying in a surprising manner those very principles. 
We quote from the preface after noting that the organization of the 
book is true to the statomcnla there set forth: 

•Strong, E, K., Jr.: “Brief Introductory Psycliology for Teachers.” Balti¬ 
more, Warwick and York, Ino., 1^22, pp. xi -b241. 
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‘'Instead of beginning with the most unititorosting piinseB of psychology and 
those most unknown to students, the course takes up concrete experiences of 
everyday life, relates them to the problems of learning, individual differoncos, and 
influencing others, and so develops these topios.. Each general principle is dis¬ 
covered by the siiidont out of his Own experienco in solving spcoially organized 
problems. Only after ho has dono his boat is ho expected to refer to the text and 
l)y then the text is no longer basio but only supplomontavy, clearing up iniaundei- 
Btandings and broadening the whole viewpoint. . . . The student is immedi¬ 
ately introduced to problems of behavior taken aa ft whole and only after ho is 
fairly familiar with psychological procedure, terminology and vroint of view, is he 
given his psychological background. . . . Each topic is handled as follows: 
(1) The student performs an oxponment illustrating the principle to bo ompha^ 
sized, (2) he solves the problem as beat he can and hands in his report, (3) he has 
the benefit of a class discussion upon the subject at the next class-hour, (4) he 
roads over what the author has to say on the subject, (5) ho receives back his own 
corrected paper on the subicob, (0) he roviews the subject lotor on. ... The 
testis printed as a book and in the form of 23 booklets. . . . This is important 
as the odd nutobored Iobbods contain the finswots to most of the piobloms. WhoTe 
students read ahead they lose the training rosulUng from working problems out 
for themselves.'^ 

'Turning to the lessons, ono is at 0rst disappointed to see that the 
firet lesson begins with the caption What ie Psychology?” How docs 
that carry out the promise of the preface? But preceding the first 
definition is a long quotation from Booth Tarkington's “Penrod,'' 
followed by 7 questions on the “behavior'* of Penrod, Mabel and Sam, 
and then by the question “Can this be psychology?” Another inter¬ 
esting paragraph and wo come to the following: “The following defini¬ 
tion is just to aid the reader in orienting himself. Only toward the 
end of the course will he be prepared to grasp its full meaning. Psy¬ 
chology may best be defined ns the scionoe of behavior. There is the 
definition. The matters dealt with in the next 10 sections will give 
some of the various fields included in ite bounds.” And then we find 
10 carefully selected incidents from real life, each followed by a series 
of questions giving the beginner a notion of the scope of the field and 
the further content of the course. 

More conclusive than any a priori judgment would be the results 
of a carefully controlled cxpei’iment in which the offoctiveness, organi¬ 
zation and content of this and other “Introductory Courses in Psy¬ 
chology” were compared. 


3. On the Theory of Educational Measurement, —The appearance 
of this well pruned volume of 300 pages on the theory of educational 
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rDcasurements* is one indication, that the testing movement ia losing 
the exhilaration of an uncontrolled amateur sport and is taking on the 
sobriety and hard-headed sanity of actual scientific work. Serious 
students of mGasuremout, research workers, and other school people 
interested in the subject will appreciate what Professor Monroe has 
done here to hasten the change. The theory presented in this book is, 
if ono may use the expression, of a very practical sort. It has to do 
mainly with ,suoh things as the construction of tests and scales, the 
types of pupil performances susceptible of measurement, the meanings 
of scores and norms, the validation of testing instruments, and the 
technique of the application. There is not much of the material in 
the chapters devoted to these topics that is strictly new, but it has been 
collected from many scattered and inconvenient sources and subjected 
to new organization, classification, definition, and critical interpreta¬ 
tion. The result is a very intelligible treatment of questions that test 
makers and test users desire very much to understand. Incidentally 
a good deal of the fiction with which early enthusiasm invested meas¬ 
urement is disposed of. 

The most elaborate discussion of any topic is that given to The 
Critical Study of a Test. A long outline states the points on which a 
test is to bo judged. The main divisions are headed as follows: Facts 
Title; Nature of Pupils' Performances; Description of Pupils' Per¬ 
formances; Functions of the Test; Validity of the Test; Validity of 
Significance; Practical Considerations. The subsequent discussion is 
admirable from tlio standpoint of the academic experimentalist but a 
little discouraging to a teacher or supervisor who might consult it for a 
quick way to size up a test. A rather more practical than theoretical 
chapter on the improvement of school examinations presents the case for 
such new types as the true-false, the “yes” and “no,“ the recognition, 
and the completion exorcises. The suggestions here should have an 
appeal for any teacher who believes in examinations at all. Two con¬ 
cluding chapters on statistical methods remind us again of defects 
in our early mathematical training and are consequently needed and 
helpful. 

All chapters aro provided with good lists of questions and topics 
for investigation as well as extensive bibliographical references. 
The make-up of the book is such as to make it a decidedly convenient 


> Monroe, Walter Scott: “An Introduction to the Theory of Educational 
MoaBurement." Now York, Houghton Mifflin Oo., 1928, pp. 3G0, 
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taxt for advanced pupils in education and somewhat of a manual of 
reference for practical workers in mcosuremont. 

M, H. WlLUNCi. 

Springfield; Illinois. 


4. A Survey of Methods of Judging Human Traffs.—-This book/ 
written primarily for the general reader, iaanon-technicnlpresentation 
of the present status of methods used in the appraisal of human charac¬ 
ter. Acknowledging the incompleteness and lack of finality which 
charactorizes the present accomplishment, tl\e author, nevertheless, 
considers it desirablG to bring together into one volume, doBoriptions 
and accounts of the various lines of development which may servo as 
starting points for further inquiry and at the same time, to indicate the 
practical implicationa and applications of such measurements. The 
inadequacies and limitations of each method arc set forth in turn. 
The text is accompanied by numerous tables, charts and illustrations. 
The book could be used with profit os anintroductorytcxtinacoursoon 
mental measurements. Appendix B, following the classified bibliog¬ 
raphy, contains 16 exercises or experiments for use in such a course. 


5. The Case Method in School Management. —If one turns to this 
book* to find another systomatic and logical treatment of psychology, 
philosophy, and the principles of class management and school adminis¬ 
tration, ho will bo disappointed. If he expects to find any proposed 
solutions of teacher problems that are strikingly new, ho will scarcely 
find them hero. Its contribution—-and that it is a real contribution 
should be asserted in unmistakable terms—is in its concreteness and in 
its development of principles of echool management through the study 
of concrete school situations. 

Two hundred and forty-one practical problems are presented in IS 
chapters dealing with discipline, subject matter and method, individual 
differences, economy of time, health, relationship of teachers with eacli 
other, with supervisors and administrators, and with parents, and 
finally, with professional growth. Tho offoctivoness of the plan for 

1 Hollingworth, H. L.: “Judging Human Character.’' Now York, D. Appleton 
ond Company, 3922, pp. xU» + 208. 

* Stark, W. K: “Every Toaoher’s Probloma,’* American Book Company, 
Now York, 1922, pp. 308. 
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presentation of school problems is illustrated by the three problems 
discussed in a typical chapter. In Chapter IV, dealing with children’s 
attitudes, sclf-diroction and ideals, 3 situations are described as 
follows: 

Problora 52 (p. 67). A Imy defaces tlio school hiiilcling by writing on tho 
plaster -wabB. On entering tho wchool, tho Icaohcr observes a group of children 
examining tho scrawl. A l>oy romarlm jokingly, "That looks Hko your writing, 
Tom." Tom replies: "Sure, it's my wriling." 

Tho lenoher aakB him if he really did it, and ho says again, "Sure.’’ 

Problem 63 (p. 60). 

Junior Uioit SonooL 
Oppicb op 'TUB Principai. 

Notice to TBAciiEns 

Tlic flubjcct of Iho next toachora' meeting will be 
CoNuudr uuRiNQ Intermission Pbriodb 

Mr. Evans will report on hl» day of observation in tho II.School, whore 

pupils pass from one rocitaiion room to another without Buporvision and without 
iorming In files, lie recommends that wo adopt tho same plan. Come prepared 
to ctiaousa this proposnl. 

Edw. B. Jackson, 
Principal. 

Problem 54 (p. 67). In marking a sot of oxaminalion papers, a teacher notices 
a poouhar mistake and, a UtUo later, sUo finds exactly the eamc error in another 
paper, She thoroforo cosnparcs tho two papoi^ and finds that parts of them are 
almost identical. Tlio similarity is loo perfect to bo acoidenlal and, since the 
abler student site in front of tho othor, sho is forced to conclude that both have 
shared in the docoplion. 

An account ib given of tho active procees of eolution of these 
problems in wliicli teachers, principals, superintendontB and parents 
take part. Into this group of searchers for solutions the reader is 
invited, and the reviewer found liimscif accepting the invitation pretty 
regularly. Thou follow a dozen or so other practical "problems for the 
reader to solve,” and tlie reviewer is inclined tobelieve that the readers 
will pay more serious attention to them than it is usual for problems 
stated at chapter endings to receive. 

Throughout the book Mr. Stark has presented common sense 
solutions whicli show keen insight into the practical difficulties that 
arise in every active school Bystom. Such insight comes from a 
successful and active oxpoi’ionco in practical scliool situations, from a 
scholarly interest in educational psychology and philosophy, but most 
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impoi’tant, from a sympathotic understanding of how a youngster 
reacts when ho feels unjustly treated, and how tho young teacher feels 
who senses the fact that she is not doing well. It is a progressive and 
safe guide for practical school people who feel the wisdom of thinking 
through the problems they are called upon to solve. 

Philip "W. L. Cox. 

Lincoln School of Teachers College. 


6. Scientific Technique Applied to Textbook Selection. —^This little 
volume* furnishes two examples of scientific proeedure in the evaluation 
of textbooks. I'ollowing an introduction by Ernest Horn, the authors 
have set forth numerous criteria or principles of textbook selection. 
They stress the superiority of objective evidence over subjective opin¬ 
ion but they admit the practical impossibility of securing objective 
data in every case and suggest numerous safeguards which increase the 
dependability of judgments when they must be used. 

The application of tho criterion of interest is demonstrated in 
ICnight’s study of High School Literature Texts. Tho purpose of 
tho study is so laudable and the tcchniquo so painstaking that wo are 
inclined to reserve criticism. But we must ask, do not the various 
criteria impinge upon each other? Is not tho application of a single 
criterion therefore a mere first step which leads nowhere if not followed 
by as careful an investigation of other important considerations? 
Furthermore what are tho factors of experience with a literary text 
which are components of the good “after-taste” designated as interest 
in this study? The possible effects of various methods of teaching 
are merely mentioned. Uhl's study would lead us to believe that 
interest is highly dependent on a number of more or less measurable 
factors. Does statistical .^feliability of tho results in one phase of a 
problem excuse us from the consideration and solution of all other 
phases? 

The following quotation makes us wonder whether one must neces¬ 
sarily forswear aesthetic realities in order to be scientific. “After all, 
literary merit or any of its synonyms contains but little useful truth. 
It is similar to such phrases os the Well-rounded Life, Education for 
Complete Living, True Culture, Good Citizenship and the like. . . ’. 
These are verbal fog screens concealing chiefly differences of opinions." 

> Franzon, R. H., and ICnight, P. B.; "Textbook Solootion," Eallimoro, War¬ 
wick and York, Inc., 1922, pp, 04. 
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Id. spito of the lack of agreement as to literary merit in the data, there 
is room for other conclusions. 

Franzeu’s chapter setting fox*th tho application of the criterion of 
comprehension to geography texts does not pretend to be more than 
an illustration of the tochniquo. It does not eventuate in the recom¬ 
mendation of a specific text or group of texts. It does point out the 
necessity for conaiclcring more than one criterion. It does apply a 
single criterion in a tliorougligoing fashion and analyzes the results 
in a scientific and practical manner. The illustrative value of the 
study is not impaired by the absence of didactic recommendations. 
We turn back to Chapter I and wonder who will finish the good work 
and illustrate the application of all the criteria there listed to a single 
problem. 


Briepbr Mention 

7. Whal ia the ‘‘New Psychology ^’*—^In this case it is analytical 
psychology. The term has been avoided because of the limits to its 
application laid down by the Freudian School, with which the writer 
is not in entire agreement. It is a stimulating account of some ele¬ 
mentary facta, usually treated under the head of psychoanalysis, and 
ie designed to help the teacher in the solutions of personal and pro¬ 
fessional problems. 


8. Psychological Guidance in the Problems of Child Nurture ,—This 
book^ was written to interpret the discoveries of modern ohild study 
to parents. In order that all the agencies that contribute to the child's 
development and education may do their full part, it is essential that 
the efforts of home and school be coordinated. Teachers and parents 
will welcome a clear, straightforward presentation of the fundamental 
factors in the successful upbrin^ng of children. Planned for use in 
Parent Training Classes and similar groups, the numerous illustrative 
incidents) the problems listed as ‘'Suggestions for Further Study,” 
the practical suggestions, the correlated references, and comprehensive 
bibliography should make interesting material for study groups. 

‘ Miller, H. Crichton, M.D.: "The New Psychology and the Teacher.” Now 
York, Thomas Selzer, 1922, pp. 226. 

'Baker, Edna Dean: “Parenthood and Child Nurture.” New York, The 
Macmillan Company, 1922, pp. xvii + 178. 
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9. Applied Psychology in AriihtncLic Practice Exercises. —This 
pamphlet^ should nofc only bo used aa intended by its authors. Stu¬ 
dents of Educational Psychology and practice teachers should be 
referred to it aa illustrating well motivated drill, systematic practice, 
the pressure of a time limit, economical scoring and learning technique, 
practical provision for individual differences, definable objectives, 
supervised study, and last but not least, experimentally evaluated 
curriculum material. 


10. A Helpful Book on Sex Education. —Those who have followed 
the worlc of the American Social Hygiene Association will wish to read 
the revised edition of Dr. Galloway's book^ on problems of sex. The 
book is more than its title implies. It is not a mere descriptive text, 
as a few selected chapter heads will prove: Some Principles Which 
Must Guide Sex Instruction; The Mental, Social and Moral Bearing 
of Sex; Time and Manner of Instruction; Graded Problems and 
Projects in Sox Education. Wo know of no other book in this field so 
rich in constructive suggestions. 

> Sohorling, Hnloigh and Clark, John II,: '*Praoiioo Exoroisos for Aoourao}' 
uud Speed in tho Fundamentals of Arithmolio.” Preliminary Eclilion, Yonkers, 
N. Y., Tlio Cazotto Press, 1028, pp. 32. 

‘Galloway, T. W., Pli.D.: “Biology of Sex.” Now York, Heath, 1022, pp, 
xxili - HO. 
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‘TOWER” VS. “SPEED” IN ARMY ALPHA 

G. M. HUGH 
and 

WILHELMINE KOERTH 
State University of Iowa 

Iniroduclion .—The recent controversial literatui’e on intelligence 
testing has raised anew certain questions concerning the validity of 
ao-oalled speed tests for purposes of mental measurement. The 
experimental evidence bearing on the general question of speed versus 
power tests is unfortunately very meager. The most important 
study hitherto reported was one carried out during the war by Dr. 
Mark A. May under the direction of Dr. Lewis M. Terman in which the 
correlation was computed between scores earned on Army Alpha during 
regular time limits with those earned during double time limits. 
The cases involved numbered 510 and the coefficient of correlation 
was found to be 0.905. The conclusion drawn from this investigation, 
in the words of the report/ was that “In general, then, we have no 
reason to assume that an extension of time limits would have improved 
the test or have given an opportunity to jnany individuals materially 
to alter their ratings.” As the writers of the report point out, this 
result is to be interpreted as indicating that, although the absolute 
scores are admittedly raised by the extra time allowance, the rank 
orders of the 600 men are not markedly changed by the additional 
time. Further reference will be made to this arihy study after the 
new data have been presented. 

Psychologists have been rather generally agreed that time limits, 
if not too strictly drawn, do not work an unduo hardship on the 
subjects taking mental tests in the great majority of coses. The fact 
that very few crucial experiments have actually been carried out does 

' Psychological Examining in the United States Army. Memoirs oj Ike National 
Academy of S&ienecs, Vol. XV, 1921, p, 416, 
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not necessarily imply that there are mauflicient grounds for this belief. 
The evidence is, however, of a more indirect nature than one might 
wish. It rests on a wide variety of fairly well established facts such as: 

1. The fact that there has been shown to exist moderately high, or 
high, Gorrelation between speed and accuracy in tests of mental 
capacities like arithmetical abilities, naming of opposites, 
substitution tests, etc. 

2. The fact that repetitions of the same tests give fairly uniform 
results; which, in other words, means that they show a satis¬ 
factory amount of eelf-corrclation or reliability. 

3. The fact that under practice individuals tend to gain, from 
period to period, in a consistent manner so that there exists 
high correlation between initial and subsequent performances. 

4. And finally, the fact that the time limits for all carefully stand¬ 
ardized teats have been experimentally determined in such a 
way as to guarantco that the majority of subjects do have time 
to complete all or nearly all of the tost items which are within 
thoir capacities. 

Such evidences are real even if admittedly too indirect to convince 
everyone. It is with tho liopo that additional direct oviclenco might 
be brought forward that tho authors have undertaken to extend 
and otherwise supplement tho previously reported work with tho 
Army IntoUigonce Examination Alpha. Tho present study has also 
its shortcomings which will be pointed out as tho occasions arise in 
the discussion. 

The General Natiire of the InvesUgalion ,—With tho cooperation 
of the Dean of the College of Liberal Arts of the State University of 
Iowa, the writers called in for the purposes of this experiment 122 
freshmen who had previously taken tho reg\ilar entrance intelligence 
examination during October, 1922. This group was made up as 
follows: 

(a) Seventy students who earned percentile scores falling in the 
lowest docile of the total distribution of scores, f.e., between 
the first and tenth percentiles. 

(l>) Fifty-two students who earned percentile scores falling within 
the highest deoilo of tho total distribution of scores, i.e., 
between the ninetieth and one-hundredth percentiles. 

This group of 122 students are quite representative of the greatest 
extremes of talent of tho entire freshman class, although, of course, 
constituting a highly selected group in comparison with an unselooted 



'*Power** vs. ''Speed" in Armu Alpha 


196 


adult population. The sexes are approximately equally represented. 
The percentile ranks were based upon the combined scoroa of four 
intelligence tests as follows: the Thorndike Intelligence Examination 
for College Entrance, Part I (two forms and the practice form), 
Morgan’s Teat of Mental Ability, and the Iowa Comprehension Test. 
The total working tiino for this battery of tests is somewhat more than 
two hours. The actual scores of the groups used in the investigation 
arc not reported here because the previous test scores were only used 
to select the two groups already described. These groups will here¬ 
after bo referred to as the “high” and “low” groups. 

The detailed description of the experiment follows. The students 
were requested to report to one of the large assembly rooms at 3 P. M. 
and at that time a brief explanation of the purpose of the meeting was 
given by the Dean of the College of Liberal Arts. Tliey were simply 
told that at the time of the Fall examination some of them complained 
that their scores were too low because they were not given sufficient 
time to complete the tests, and that feeling that there might be some 
truth in their belief, we had decided to give them a second chance on 
another test in which they would be given all the time that they cared 
to use. They wore instructed further not to surrender their papers 
until they were absolutely satisfied that no further amount of time 
would raise their scores by os much as one point. Evory effort was 
made to induce every subject to attempt each item in the entire test. 
This was not quite realized because in a very few cases students 
refused to record their guesses and attempts on items that they felt 
were entirely too difficult for them. The effect of these instructions 
was to reassure any nervous subjects and to create a good working 
spirit. 

The three stages of the experiment will be described in successLon. 

Period 1.—The entire group was given Form 7 of Army Alpha 
under the strict procedure of the Examiner's Manual. During this 
period of the test all of the subjects were required to work with ordi¬ 
nary black-loaded pencils. No variations from the standard procedure 
■were allowed in order that the results 'would be strictly valid for com¬ 
parison with the norms for Army Alpha. At the end of the test the 
black pencils were collected, 

Period 2.—Bluo-leaded pencils were passed out to all of the sub¬ 
jects, who wore then told that the tost would be repeated under 
the same time limits and that they could make any corrections or 
additions that they wished. The further order was given that no 
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ei'MUi'es were to be made in the black pencil responses. Corrections 
could and itiust be made by merely indicating the correction by 
placing the blue pencil mark where it belonged, allowing the old 
record to stand. The students were assured that the blue responses 
would be given precedence over the blaek onea in all cases, This 
procedure made it possible to trace out exactly in which period each 
right and each wrong response was made. Scores for single and 
double tivne could thus be separately computed. The regular tech¬ 
nique of Alpha was followed in tlic second period except that the 
instructions for certain of the tests were slightly abbreviated. At 
the end of this period the blue pencils were collooted and a few minutes 
intermission given. 

Period 3.—The subjects were given red-leaded pencils for their 
further work. During the intermission the booklets of all of the 
subjects who indicated that they had completed all of the work 
within their capacities were examined. In case all of the items had 
been attempted and it seemed likely that the subject had really 
''worked himself out,” ho was permitted to surrender his booklet 
and leave the examination. However, even under these circum¬ 
stances, each subject was urged to continue longer. A considerable 
munbor of the "high” group could not be prevailed upon to spend 
more time and were consequently excused. Those remaining for the 
third period were then informed that they might take as much time to 
finish as they Btill needed. They wore told to go from tost to test 
without directions from tlie examiner, perfecting those portions of the 
test that were not yet completed. In the third period, the instructions 
for Test 1 were read once more, this making the final reading. In 
the case of this test the regular time limits were again observed. 
During the third period the subjects were excused as they finished, 
the usual inspection of the booklet being adhered to, The 
total working time for this period was indicated upon tho tost 
booklet. 

As has been suggested, the technique used allowed the computation 
of the scores earned in single, double, and unlimited time, separately. 
It should bo pointed out hei’C that, duo to the fact that some of the 
most rapid workers finished in double time, the scores of such individ¬ 
uals are identical for double and unlimited times. This really amounts 
to self-correlation in these eases, but this can bo defended upon the 
basis that tho identity of tho two sets of .scores is real evidence that 
power scores arc actually being considered. 
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The examination began at 3 P. M. and the slowest subject finished 
his test at 5.60 P. M., using a total working time equal to approxi¬ 
mately times the regular time limits. 

Staiemenl of the Results .—The scores for single, double, and unlim¬ 
ited time were tabulated separately for purposes of statistical analysis. 
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The important results are set forth in the following series of figures 
and tables. Pigure 1 shows the scatter diagram for the correlation 
of the scores earned in single and double time. Figure 2 does the 
same for single time and unlimited time. The scores of the high group 
arc shown by the dots and those of the low group by the crosses. It 
will bo noted that there is practically no overlapping, a result which 
would be expected in view of the fact that the two extreme deciles are 
alone concerned. 
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The correlation for the total scores earned in single time with the 
total scores earned in double time. 

The correlation for tile total scores earned in single time with the 
total scores earned in unlimited time. 

The correlationa aliown in Figures 1 and 2 are given in Table I. 
The Pearson product-moment formula was used. 



Tabus 1 


r PE N 

Single time with doviblo tiino. 0,000 0.004 122' 

Double time with unlimited time. 0.045 0,007 122 


Since the gonoml appearance of Figure 2 suggests tliat the distri¬ 
bution is not quite rectilinear, the correlation ratios (i?) wore also com¬ 
puted, These are 0.067 and 0.066 (uncorrccted for errors due to 
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grouping). The relationship is probably somewhat closer than that 
suggested by the product-moment coefficient of correlation. The 
agreement bctwcon the correlation of single and double time with 
that reported in tlio Army figures is striking. The range of talent 
involved in the two investigations is approximately equal (Table II) 


Tablh II 


Mcftn, ainglo time- 

McAii, doublo time.... 
Moon, unlimited time 


Gain, double over single. 

Gain, unlimited over double. 

Gain, unlimited over single. 

Standard deviation, single time. 

Standard deviation, doublo time. 

Standard deviation, unlimited time. 

Cocfliciont of variation, aiuglo time. 

Coefficient of variation, doublo time. 

CoolBoiont of variation, unlimited time. 


Army 

. G2.0 
. 80.6 


18.5 


86.0 

42.2 


0.665 

0.624 


Iowa 

Bxpbrimbmt 

127.6 

140.6 
156,0 


22.0 

6.4 

27.4 


38.2 

S4.4 

31.9 

0.200 

0.230 

0.206 


although the general mental level of the Iowa group is groatlyabovo 
that of the 510 soldiers, the means being 127.6 and 62.0, respectively, 
for single lime. The most important difference between the two 
groups seems to lie in the fact that the variability of the Army 
group increased under added time while that of the college group 
decreased, probably due to the fact that many of the high college 
group were drawing near to perfect scores and hence had less oppor¬ 
tunity to gain than was the case with the low college group. This 
decreasing variability of the college group is a rather general tendency 
throughout all of our data and operated as a serious limitation to the 
adequacy of the present technique. Even, with this constantly increas¬ 
ing curtailment of the range of scores at the upper end of the college 
group, it is to be noted that the Iowa group gained more (22.0 points) 
than the Army group (18.6 points) in doublo ^ime over single time. If 
these figures are trustworthy, it would appear that the more intelligent 
subjects gain more than the loss intelligent subjects, the college group 
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being considered as more intelligent than the Army men as is clearly 
evident from the mean scores. The curtailment of the scores of the 
college group at the upper end of the distribution has very probably 
operated also to dcoreaao the correlations between single and double 
time, and again between single time and imlimitcd time, 

Figure 3 shows the relative gains of the high and low groups in the 
Iowa experiment. Tlie enforced curtailment of the scores of the good 
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*Hlgh" group, slrglo Mm 

group, unllBlted Vlu 
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group under increased working time is clearly evident in the curves. 
The amount of overlapping i-omains approximately the same and the 
low group did not seriously threaten to overtake tlio high group 
although the lead of tho latter was somewhat cut down. Further 
discussion of this point will be postponed until certain additional 
facts have been presented. 

The correlations between single time and double time, and single 
time and unlimited time, test by teat, are given in Table III. The 
corresponding Army figures have also been included. 
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Taolb ni 


Test 

Single time with double time 

1 

Singlo timo witli 
unlimited time 

Iowa 

1 Army 

' Iowa 

1 ^ 

0.823 


0.776 

2 1 

0.932 

0.937 

0,880 

3 

0.456 

0.870 

0.271 

4 

0,925 1 

0.940 

0.806 

5 

0.876 ' 

0.002 

0.817 

0 

0.921 i 

0.960 

0,866 

7 

0.905 

0.920 

0.886 

8 

0,046 

0.910 

0.919 


With the exception of Test 3 (Practical Judgment), the two 
sets of coefficients are in remarkable a^'eeraent. Test 3 is undoiibt- 
ably too easy for the college group and too much of a speed test to 
stand up under the conditions of the experiment. 

Table IV shows the means, standard deviations, and coefficients 
of variation for the Iowa group, test by test. The coefficients of 
variation, here ns elsewhere, are the ratios between the standard 
deviations and the means (Pearson's meth(Kl). 


TadijB IV 


TCBt 

Means 

Standord deviations 

CoefficioDls o( varinlion 

Singlo 

Double 

I5b- 

l!?nUcd 

Singlo 

Doubto 

Un¬ 

limited 

SIhelo 

Double 

Un¬ 

limited 

1 

7.83 

0..'>0 

10,05 

2.58 

2.00 

1.01 

0,320 

0.215 


2 

10,OC 

12.80 

14.0fl 

3.12 

1.00 

3.03 

0,312 

0.315 


3 


13,07 

1-1.41 

2.32 

1.53 

1,23 

0.233 




20,53 

22.01 

23.37 

D.31 

0.72 

0,80 

0.455 

0.424 


b 


10.03 

10.30 


6.00 

4.08 

0.330 


0.242 

0 

11,37 


H.80 

4.18 

4.72 

4,20 

0.306 

0,345 

0.238 


20,'!7 

30.78 

31.57 

11.02 

0.80 

8.8S 


0.318 

0,232 

3 

2,3.01 

20.03 

20.08 

7.-10 

5.70 

5.07 


0.210 

0.210 


Table V presents the gains, test by test, figured from the means 
given in Tabic IV. 
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Table V 

Test. 12 3 4 6 0 7 8 

Double over Binglo. 1.76 1.03 4.01 2.3B 2.24 2.31 4.31 2.72 

Unlimited over double.... 0,40 1.17 0.44 0.40 0.30 1.21 0,70 0.36 

Unlimited over single. 2.22 3.10 4.46 2.82 2.00 3.62 6.10 3.07 

Army results: 

Double over fimglfl.. .... l.W 3.63 2.08 3.14 1.10 3.80 4.20 


These gains for the separate tasta cannot be compared with much 
meaning since the stops between the various items are of quite 
unequal difTioultjcs from test to teat. Likewise the gains in the 
several tests by the Iowa group cannot be compared directly with the 
Army group since the two groups wore working at widely different 
levels on the total teat. To gain one score point from 175 as a base 
may be considerably more difficult than to gain 1 point with a score 
of 75 as a base. Certain tests like No. 7 (Analogies) permitted rather 
marked gains to be made but Test 1 (Direotions), Test 4 (Synonym- 
Antonym), and Teat 5 (DisevraTiged Sentences) allowed compara¬ 
tively little gain from ainglo to unlimited timo. It may be concluded 
that with the exceptions of Test 2 (Arithmetical Problems) and Test 
0 (Number Series Completion) thoro would seem to bo absolutely no 
justification for allowing more than double time at most. These are 
the two tests of the battery that involve mathematical abilities. 
In but one other case (Test 7, Analogies) did the gain of unlimited time 
over double timo exceed five-tenths of a point. 

Table VI gives the means, tost by test, for the high and low groups 
separately, period by period. The maximum score for each tost is 
given for comparison with the obtained 8Core.s. 


Tadlb VX 


Test 

Single time 

Double time 

Unlimited time 

Maximum 

score 

High 

Low 

High 

Low 

Ilieh 

Low 

1 

in 

0.14 

11,10 

8.47 

11.35 

9.09 

12,00 

2 

mast 

8.04 

10.79 


17.86 

10.27 

20.00 

3 

13.38 

8.73 

14.62 

13.44 

14.03 

14.37 

10.00 

4 

29.16 

14.30 

31.84 


ai.oo 


Huiiiin 

6 

21.00 

12,21 

22.62 


22.61 

17.41 


0 

16.60 

8.30 

18.21 

10.44 

18,76 



7 

30.58 

19.21 

38.64 

25.01 

38.54 


40.00 

8 

30.29 

19.19 

31.40 

23.06 

31.40 
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in Army Alpha 

com™ d rom ‘1- two groups 


Tajim VII 


Test 

DouMo time over 
single time 

Unlimited time over 
double time 

Unlimited time over 
single time 

High 

Low 

Higli 

Low 

High 

•Low 

1 

2 

3 

4 

6 

0 

7 

5 

1.00 

2,60 

2.14 

2.60 

1.62 

2.71 

1,06 

1.17 

2.33 

1.30 

•1.71 

l.OQ 

4.76 

2.14 

6.80 

3.87 

0.25 

1.06 

0.11 

0.12 

0.00 

0.64 

0.00 

0.00 

0.62 
0.27 
0.03 
0.32 
0.44 
1.68 
1.38 ■ 

0.24 

1,26 

3.75 

2.25 
2.81 
1.61 

3.25 
1.90 
1.17 

2.95 

1.63 

6.64 
2.31 
6.20 
3.72 
7.18 

4.11 


Table VIII shows the standm-d deviations of the scores of the 
high and low groups, tost by test, and period by peril 
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Tablk IX 



Single time 

Double timo 

Unlimited time 

i CStr 

IliRh 

Low 

IliRh 

liOW 

High 

Low 

1 

0.124 


0.081 

0.231 


0.212 

2 

0.173 

0.213 




0.263 

3 

0.178 

mEM 

0.080 

webm 


0.081 

4 

0.188 


0.108 

0.388 


0.380 

5 

0,126 

0.432 

mSm 

0.304 


0.220 

0 

0.120 

Q.2S2 

BbH 

mmm 

^K]|Sm9 

0.280 

7 

0.081 


■£{9 


0,046 

0.322 

8 

0.142 

0.270 

■1 

0.184 

0.122 

0.180 


Examination of tho relative scoroa of the two gi'onpB tabulated 
in Tables VI, VII, VIII and IX reveals several interesting differences. 
Figures 4 and 5 which follow show the same facts in terms of percent¬ 
age distributions of the scores for the two groups in tho separate 
tests for single and unlimited time. Double time is not shown in 
these graphs. 

In the first place, in abnost every teat there is good evidence that 
the high group suffered under the handicap of being forced to work at 
the upper part of the tost series where there was little opportunity for 
gain after the singlo timo period. Tho scores of the high group are 
crowding the maximum possible scores so closely that there results 
marked curtailment of the distributions in almost- every test. This 
shows in the Tabic of Moans, in tho Tables of Gains, and in the Tables 
of Variability with groat consistency. The low group, except in one 
or two tests, suffered under no such limitations. Their variabilities 
and gains were almost uniformly greater than those of tho high group. 
In no COSO is the high group as consistently variable os the low group. 
In two of the testa, Test 2 (Arithmetical Problems) and Test 4 
(Synonym-Antonym), the high group improved more than the low 
group aa a result of the added timo allowances. In all other tests 
tho low group improvod more. Howovov, in no tost did the low group 
ever roach tho level of performance of the high group with the possible 
exception of Test 3 (Practical Judgment) which, as has already been 
suggested, appears to bo far too easy to have much value with superior 
adults except when administered aa a speed test. 
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That the low group gained more than the high gi-oup is to be expected 
in view of the limited possibilities of the former in earning additional 
points. The Army results were exactly opposed in this respect when 
increases in absolute score points are considered, although, as the Army 
ropoi’t‘ points out, in terms of percentages, the poorer subjects gained 




d«orot 



ftrcupj •IcglQ U«r 
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Br«up. Vlao 

'Ifisa* BNUpr uniiakUtf Um 


ri«. 4 


most (a fact which is a mathematical function of the differences in 
the bases from which the percentages are computed). 

Nummary and Conclusiom .— hEis become increasingly evident 
in the preceding discussion that Army Examination Alpha is not en¬ 
tirely satisfactory as an instrument for studying the factors of speed 


‘ See Table 74, p. 416, op. cU. 
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versus -power^ at least with intelligent adults. The vetwon for the choice 
of Alpha for this investigation was principally that of verification of 
the work reported in Psychological Examining in tho United States 
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Array. Our results verify the Army work in all important conclusions 
as far as the conditions aro strictly comparable. 

For use with college students a tost with more ^‘top ” is demanded. 
The Thorndike Intelligence Examination for Collego Entrance would 
probably prove more satisfactoy for work with adults. Neverthe¬ 
less, there is one very good reason for the use of Army Alpha in the 
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present study that should be emphasized, viz., that Alpha ia probably 
typical of what we term speed tests. The Thorndike Test places 
much more of a premium on power, or long-sustained effort. One of 
the writers (G. M. R.) ia at present engaged upon further studies of 
this general problem with other group tests of intelligence and educa¬ 
tional abilities, using children young enough to obviate the possibility 
of approach to perfect scores even with the most intelligent subjects. 
Under such conditions it is thought likely that the bright subjects 
might be found to improve more, not less, than the duller ones. This 
possibility is strongly suggested by the earlier work of Thorndike and 
his students on learning and by more recent but as yet unpublished 
studies of one of the authors (G. M. R.) upon the learning of children 
of the same chronological ages but widely differing intelligence 
quotients. 

With this general Umitation in mind and properly allowed for in 
interpreting the results of the present study, the following conclusions 
are suggested as being indicated by our data: 

1. Admitting that Army Alpha is largely a speed test, the fact that 
single time correlates 0.966 with double time, and 0.946 with 
unlimited time indicates that the speed factor does not seriously 
invalidate the test. In fact, it can be shown from figures 
already presented that the probable error of estimating scores 
for double time and for unlimited time from scores earned in 
single time ia about 6.7 and 8.4 score points, respectively.^ 

2. Increasing the time allowances does not permit dull subjects 
to equal the scores of the more intelligent subjects. In fact 
the mean of the low group for unlimited time was still well 
below the mean of the high group for single time (Figures 1-3). 
Whether the differences between high and low groups are 
decreased or augmented by increasing the time limits cannot 
be definitely answered from the present data because the scores 
of the high group were too near the maximum possible to 
allow equal opportunity to both groups. The Army figures 
seem to indicate that, in terms of absolute scores, the brighter 
subjects improve somewhat more than do the dull. 

3. When total scores are considered, there was no increase in the 
amount of overlapping of the high and low groups when the 
time allowances were increased. 

‘ Computed from the usual formula: _ 

PE(t,i) =* O.074S(riVl — ri8* 
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4. Kathcr marked differences in susceptibility to the influence 
of the speed factor wore evident in the different tests, certain 
ones being much more open to tins objection than others. 
This fact suggests a new ci’ifccrion for tlie validation of tests, 
scales, and other types of measurementB. The only test which 
was invalidated by the iiicrcasod time limits was Tost 3 (Prac¬ 
tical Judgment). Tost 1 (Oral Directions), is also not very 
satisfactory. Teat 2 (Arithmetical Problems) and Teat 4 
(Synonym-Antonym), became even more discriminating with 
added time. 

5. The present results substantiate tho important findings of the 
earlier Army investigation when proper allowance is made for 
the fact that Alpha is far too easy for the good students of the 
college group. The correlations reported by the two investiga¬ 
tions are in striking agreement. 



SOCIAL HATING OF BEST AND POOREST HIGH 
SCHOOL STUDENTS 

PAUL V. SANGREN 
Deparlmont of Eduflation 

Westoin Slate Normal Scliool, ICalamazoo, Michigan 

It is appaveut thal a faivly high covrclation exists bet-ween scholar¬ 
ship and scores on intelligence tests- It would be interesting to know 
(1) whether there are qualities in addition to intelligence which affect 
scholarship, and (2) whether the best students do not also possess 
greater possibilities of success in the practical world. Upon these two 
points the writer, while superintendent of schools at Zeeland, Michigan, 
had opportunity to collect some data. 

Zeeland high school is small, the enrollment being 165 students. 
This, together with the fact that many of them had been in the 
system one or more years, made it possible for the 9- high school 
teachers to become well acquainted with the students. Teachers 
could, therefore, bo counted upon to pass fairly intelligent judgment 
upon the qualities and traits possessed by individual students. 

All high school teachers constructed independently a rating scale 
using tho following eight abilities or qualities: methods of work, 
application, industry and attitude toward work, ability to assimilate 
new ideas, physical vigor, social and personal qualities, leadership, and 
team-work. There were 5 degrees for each of these qualities with 
numerical ratings to correspond, these degrees and ratings being 
divided as follows: best student, 38; better than average, 30; 
average, 22; poorer than average, H; poorest student 6. The 
scale was, therefore, modelled after Form B of Rugg's Rating Scale 
for Judging High School and College Students. To construct the 
scale the teachers filled the blank spaces with the names of high school 
students who were thought to possess the qualities in the various 
degrees. Explanations concerning the meaning of each quality or 
ability wore given the teachers in printed insfei’iiotions. 

Having completed the construction of their scales, the teachers 
rated 24 high school students by direct comparison with their own 
scales. Of the students rated, 12 were students who made high scores 
on Torman’s Group Test of Mental Ability and averaged over 90 per 
cent in all of theiv school work for the inoiitha of September, October, 
and November, and 12 were students who made low scores on the 
Terman test and averaged less than 83 per cent in all of their school 
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work for the three achool months mentioned. Of each of the two 
gi'oups of 12 students, 3 were selected from each of the four high 
school grades. Teachers were instructed to pass judgment upon no 
student whom they did not know well and to rate numoricnlly. They 
were not told how the students were selected. 

When the ratings were completed, it was found that no stiidont 
Was rated by less than 4 tcnchci's nor more than 7 teachers. While 
this is not a large number of ratings for a single student, it is equal to, 
if not greater than, the number of ratings made ns a rule in judging 
t!ie efliciency of a teacher. For simplicity we will call the group of 
best students the “best group" and the group of poomat students the 
“poorest group." Wo wiU also call the qualities of methods of work, 
application, industry and attitude toward work, and ability to assimi¬ 
late new ideas the “soholavship qualities" and the qualities of physical 
vigor, social and personal qualities, leadership, and team-work the 
“oitizonship qualities." 


TAni,i31.—TBACfiEns' Avrragb IUtinos op “Bbst" and " Poorest'’ Groups 

IN ScilOMRSIlIP QUALmiCK 


Saholni'flhip quality 

Groiq) 

Dost 

Poorest 

MethOde of worlc .... i. 

'i2.1 

■I 

Application.:. 

33.0 


Indufltiy and uttitudo.. 

33.1 

10.7 

Aflsimilfttion of new ideas. 

32.0 1 

12,7 

Avomges...; .., 

33.5 

16,5 


When the average ratings of each individual teacher were con¬ 
sidered, it was found that they consistently rated the “beat group" as 
“better than average" in the scholarship qualities. There were 
one or two exceptions among the tcaolicvs, l)Ut none of them rated 
below average in these qualities. Furthcrinoi’c, the toachor.s almost 
consistently rated the studonts of the “poorest group" us “poorer 
than average " in those same qualities. This will appear upon examin¬ 
ation of Table I, which merely presents the average ratings in scholar- 
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ship qualities for both groups. Tables covering the average ratings 
received by each individual student will not be included here, Figure 
A, showing in graphic form the ratings of Grade IX students of both 
the groups in intelligence, scholarship, and scholarship qualities, will 
give a clear idea of how teachers judged individual students. A study 
of this figure will make it evident that students above average in 
intelligence and in scholarship are I'ated above the average in scholar¬ 
ship qualities, while studente below average in intelligence and 
scholarship arc rated below avci’age in scholarship qualities. The 
same facts would bo evident if individual ratings of all students of the 
remaining three grades were included. 



TlguTk i, •• Oonpi»tLT« rating ftf ittrao ^«st &nd t^rse poerttt 1)1^ ivViool 
frsihBtni Broken lln«B rtprdfeBnt ptorati atudanti) untiroken llnsa repraosnt 
b«Bt itudsntB 


Table II shows the average ratings of both groups of students in 
the “citizenship qualities.” The teachers were fairly consistent in 
rating the best group of students somewhat above average in these 
qualities and in rating the poorest group “poorer than average.” 
This will be seen when the average ratings of the groups are compai'ed 
as in Table II. Figure B will show gi*aphieally the ratings of indi¬ 
vidual students of Grade XI involving the two groups. Here the 
comparisons are made between ratings in intelligence, scholarsliip, 
and citizenship qualities. As a rule, students who are rated above 
average in intelligence and scholai'ship arc rated above average in the 
citizenship qualities, while students below average in intelligence 
and scholarship are rated below average in citizenship qualities. If 
comparisons were made for individuals of the other throe grades, the 
results would be similar. It seems that there is a tendency, however, 
for a slighter discrimination between best and poorest students in 
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citi'/enship qualities ns wo pass up through the years in high school 
and reach Grade XII, Shall wc say that tliis is duo to elimination, 
or shall we say tliat it is due to training in high school? 

TabivB II.—Tjsacuehs' Avkiiaqb Ratings of “Bkht" and "Poorbst" Gitoura 

IN OlTIZBNStllF QuaI,1TIK8 


Citizenship quality 


Physical vigor. 

Social and poreonal qualities 

Leadership. 

Team-work. 

Averages. 


Group 


Host 1 

Poorest 

21.B ' 

■n 

27.2 

mssm 

20.2 

n.i 

26,0 

13.0 

20.0 

13.0 


It is also interesting to note also that the slightest difference which 
occurs between members of the best group and members of the ])oorcst 
group in citizenship qualities is in the quality of physical vigor. But 
in spite of thi.s fact the best students have tho greater physical vigor. 
Correlations between intelligence and total ratings in scholarship and 
citizenship qualities and between scholarship and scholarship and 
citizenship qualities were calculated by tho Spearman Bank Order 
Method. The results wore as follows: 


Intolligonco and scholarship qualities. 0.86 

Scholarship and scholarship qualities.... 0.03 

IntclligGnoo and citizenship qualities. 0.77 

Scholarship and citizenship qualities....0.92 


CoNCLuaio^s 

When scores on tho Termaii Group Tost of Mental Ability arc taken 
as measures of intclligoiico and average marks in all subjects for 
September, October, and November are taken as measures of scholar¬ 
ship, tho teachers make the following judgments concerning “.scholar¬ 
ship” and “citizenship” qualities possessed by the poorest and best 
high sohool students at Zeeland; ,, 
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1. The twelve students of the four high school grades who are 
rated high in intelligence and scholarship are rated “better than 
average^ in industry and attitude toward work, methods of 
work, application, and ability to assimilate new ideas. The 
twelve students of the four high school grades who are rated 
low in intelligence and scholarship arc rated “poorer than 
average^ in these same qualities. 


TYiytiakl Btolkl hnA 


IntelllR 

enee 

SQholarihln Viour 

Paraon&l riAdanhln verV 



IHI 








H 






IK 

wBBB 


~ . ^ 


7«ordBt_ 








Plfur* B.* dOMBftratlT* 4f thr** bMt Md thri« peer*it hl^ lOhoeX 

juDiorti Br»k«n Xlnti r«pr«i«nt •ivdvnU} Xlnei Ytprtii-nP 

i«ii atud^nta* 


2. When average ratings of individual students are considered, 
those who are above average in intelligence and scholarship are 
rated above average in the qualities mentioned in Conclusion 1, 
while students individually rated below average in intelligence 
and scholarship are rated below average in these qualities. 

3. The correlation between intelligence and the total rating in 
methods of work, industry and attitude toward work, appli¬ 
cation, and ability’’ to assimilate now ideas is high, being 0.86. 
The correlation between scholai'ship and these qualities is very 
high, being 0.93. 

4. The 12 students who arc rated high in intelligence and scholar¬ 
ship are rated somewhat above average in the qualities of 
physical vigor, social and pci'sonal qualities, leadership, and 
team-work, while those who are rated low in intelligence and 
scholarship are rated “poorer than average" in these qualities. 
This holds true in tlic individual rating and comparison of 
students as well. 

5. The correlation between intelligence and the total rating in 
physical vigor, social and personal qualities, leadership, and 
team-work is fairly high, being 0.77. The corrolation between 
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scholtii'slup and the qualiUcii just mentioned ia very high, 
Veing 0.92. 

Thus it would appear that (1) scholarship of high school students 
ia determined by the student’s methods of work, application, iuduatvy 
and attitude toward work, and ability to a.ssiini)atc new ideas as much 
as by intelligence; (2) that the best sUidents possess in a greater degree 
the qualities which will make for auceess in the practical world, and 
that (3) although the brilliant student is often heralded as a "freak,’' 
it would l)e safer lo gamble upon his success than upon the success of a 
poorer, less intelligent atmleut. 



A PRELIMINARY STUDY OF THE PROBLEMS IN 
THE TRAINING OF THE NON-PREFERRED 
HAND 

DORA KEEN" MOHLMAN 

Biircnu of Educational Rcaoaroli, Uniycrsity of Illinois, Urbana, Illinois 

The purpose of this study is (1) to designate certain points of 
Bignificanee in the training of the non-preferred hand; (2) to make some 
suggestions for the technique to be employed in this training; (S) to 
suggest problems for experimentation in this field; (4) to present a 
rather comprehensive bibliography of the litei’ature dealing with the 
various problems of handedness. 

In order to acquaint the reader with the technical terminology of 
the present study os well as of other studies in the field of handedness, 
the meaning of certain widely used terms are stated briefly in the 
following paragraph. 

The general fact of uneven-handedness is denoted by the term 
dexlralily. The hand which is inferior to the other in dexterity and 
strength, and which in consequence is less frequently used is spoken 
of as the non-prefened hand. A person who is naturally right-handed 
is spoken of as a dexlral; a person who is naturally left-handed as a 
sinisiral. An individual is designated as a dexiro’sinisiral if he is 
left-handed but has learned to write with his right hand. This term 
may be also used to designate all left-handed persons whose right 
hand has been trained to caiTy on any activity, other than that of 
handwriting, natural to the left hand. Either of the terms amhi- 
dexteriiy or avihidcxLrality may be used to describe the condition 
wherein neither hand is preferred over the other and where each 
hand can bo used alternately to perform the same task. 

I. Significant Points 

Value of Amhidextrality^ —^The question as to whether all children 
should be trained to make an equal use of both hands has been given 
some attention. In certain of the discussions a purely practical appeal 
for ambidcxtrality is made. This appeal is based on the increase in 
eflicicncy which would thus result in the performance of the movements 
required by trades or professions and by the necessary acts of daily 
life, and on the benefit which would accrue in case of the loss of the 
naturally preferred hand. Most of the authorities, as well as the mass 
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of laymen, liowcver, arc still of the opinion that ambidoxtrality has 
little Rractical value. 

Certain inveatigatora among whom are Kipiani (86) and Mac- 
naughton (104) advocate the symmetrical education of both sides of 
the body os a means of pTOventing aphasia, St. Vitus’s dance, tic, and 
various disorders of the iiervoiia system. Tlie medicinal value of 
this education lies in the fact that both sides of the brain will be made 
to function and its latent force brought into j)lay. They recommend 
also teaching paticnlB who are suffering from war ajihasia to write and 
draw with the loft hand in order to develop the functions which have 
a seat in the right cerebral hcmiBi^hcvc. The work of those authors, 
however, does not seem to havo been conclusive, for the training for 
ancbidoxtrality as a valuable means of prevention or cure of nervous 
disorders has not received general recognition. 

Speech Dislurhanco .—^Thc opinion that teaching the left-handed 
child to uao hia right hand causes disturbances in the mechanism of 
speech has boon widely discussed and seoms to havo been rather 
widely accepted. This opinion is largely based on the results of 
Ballard’s (H) investigations of London school children in which 
figures arc given which purport to show that training the left-handed 
child to write with hia right hand produces speech defects. Wallin 
(160) in his "Report on Speech Defectives in the St. Louis Public 
Schools/* however, advances evidence wliich ''corroborates only 
mildly, if indeed at all, Balloi-d’a conclusions.” On seeing the pre¬ 
liminary report of Wallin's investigation, Dr. Ballard writes as follows; 
"I regard my investigation as more or loss preliminary or suggestive 
rather than conclusive.” Many, perhaps, would find a greater 
degree of corroboration between Ballard's and Wallin's conclusions 
than Wallin himself does, yet it is evident that the icsults so far 
obtained have not definitely settled the question of the relation of 
left-handedness and speech defect. 

War and Indusirial Cripples .—^Thc people of the United States, 
up to a very recent date, have been indiffovent toward the industrial 
cripples of the country. It has been little known or realized that each 
year's toll of men permanently disabled through industrial accidents 
approximates 70,000 or 80,000.' Of these approximately 0000 


iRubinow, I. M.; A Statistical Ck>n8idcTation of tbo Number of Mon Crippled 
in "War and Disoblod in Industry. Publicatione Red Cross Inslilide Jor CHpiilcd 
and Disabled Men, Series 1, No, 4, New York, 1018, pp. 17. 
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suffer the entire or partial loss of an arm. The vital problem of the 
War’s crippled is turning our attention, however, to the perhaps even 
more vital problem of industry’s crippled. It is very probable that 
in the near future we shall have institutions for the disabled in indus¬ 
try similar to the hospitals and schools for the rehabilitation, re¬ 
education, and vocational training of the disabled soldiers and sailors. 

In conditions such os these a study of the problems connected 
with the use of the noii-preferrod hand becomes move deeply mean¬ 
ingful than ever before. 

Methods of Diagnosing Handedness .—A study by Arthur L. Beeley 
(18) contains an excellent discussion of the problems connected with 
diagnosing the native handedness of children together with tests 
which many would agree “will render diagnosis of handedness more 
accurate than is possible by any other existing test or method.” 

In an endeavor to answer at least partially the question as to what 
can bo done to bring about a more adequate adjustraent of the left- 
handed child to his right-handed environmerjt, Bcelcy sets as his 
task (1) tho derivation of a test or tests for diagnosing the native 
handedness of children; (2) a discussion of the relation between left- 
handedness and “mirror writing.” 

Ho evaluated cortain existing tests for diagnosing handedness 
and stated their limitations. The tests discussed were the strength 
of grip, the tapping, the tracing, and the steadiness tests described 
by Whipple (1Q4); and the Brachiomctei* test used by Jones (73) 
according to his tlicory that tho right- or left-handed child is born 
with larger bones on his right or left side reapoctively. 

As a method of procedure in the derivation of his tost, Beeley 
chose certain existing and suggestive tests and correlated their results 
in diagnosing handedness with the actual facts of handedness in a 
group of children from the Grades III, IV, V, and VI. The hand the 
child used most frequently was determined by the child’s statement 
corroborated by the teacher. It was assumed that a test involving 
dexterity would be superior to a test of skill or endurance. 

The tracing, tapping, and steadiness tests described by Whipple 
were studied, In the tracing test B, a specially designed apparatus 
was used whereby contacts were broken automatically, thus elimi¬ 
nating tho objection to the old tracing test that a constant contact 
or one of long continuation is registered os one contact only. 

Among the conclusions which were reached, the following seem the 
most significant to the present study: 
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1. The finger-tapping test is superior to the wrist-tapping tost 
because; (a) Its diagnoses correlated more perfectly with the known 
facts; (b) it revealed a greater difference between tlic two sides of the 
body. By reason of Bimilnr results, the wrist-tapping rest was judged 
to be superior to the arm-tapping test. 

2. There existed a perfect positive correlation between the finger- 
tapping test and the known facts. 

3. The tracing test B evinced all charncteri.stic.'? of a valid test 
for diagnosing handedness because; (a) Its results correlated perfectly 
with the known facts; (h) tho lower the grade, the increasingly greater 
difference it revealed between the dexterity of the two hands. 

Mirm- Wriling ,—In Beeley’e experimentation concerning the rela¬ 
tion between Icft-lmndcdness and "mirror writing,” 42 out of 106,365 
school children worefound to be '‘mirrorwritcrs,"thatis,onooutof2500. 
There appeared to be a perfect po.<5itivc correlation between "mirror 
writing” and lefl-haiidcdnoss. Approximately only one per cent of the 
loft-bandcd ohildrcn,. however, were "mirror wJ'itcrs.” Baldwin's 
(0) theory was advanced in this study as the best explanation of tho 
cause of "mirror writing." Ho claims that "mirror writing” in chil¬ 
dren is probably duo to tiro incomplete aasociation of the series of hand- 
movement sensations with the control serioa of viR\ial sensations. 
Becloy’s investigations resulted in tho conclusion that "mirror writing" 
did not necessarily have a po.silivc correlation with mental deficiency 
and certainly not with visual defect. lie siiggeslod a.s a method of 
correction that the pupils should write from a copy, not from memory, 
making first the incorrect, then the correct form of tho letters with the 
left hand. 

According to Mile. Joteyko (70), all beginners at left-hand writing 
have a tendency to "mirrorwriting." Otherauthom (1,04) acceptthis 
theory as very probable, and believe that tho case or difficulty with 
which the natural tendency to "minor writing” is overcome depends 
on the strength or weakneas of the visual factor in imagery, and that 
the subject who makes little use of visual imagery is more likely to 
fall into "mirror writing.” 

II. Suggestions vor Trmnino tub NoN-rREEBUnEu Hand or Arm 

Mile. Joteyko (79) makes some rather general suggo.stions a-s to 
mothocls for training the non-preferred hand and arm in cripples. 
These suggestions are based oV her interpretation of certain physiolog¬ 
ical principles which in her opinion arc involved. She stated that 
every attempt should be made',to euablo tho cripple to resume his 
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former occupation. The left hand will thus perform the movements 
formerly carried out by the right hand, In this way the apprentice 
will reap advantage from the fact that one hand learns more quickly 
that which has already been acquired by the other hand. She ad¬ 
vances the theory that in consequence of the bilateral symmetry oC 
the body, "mirror writing" or "mirror movement" is natural to the 
left side. Therefore in training the left hand for cither an old or a new 
trade it should be forced to carry out the movements of the right hand 
in inverse direction. The larger movements should be perfected hrst^ 
and the finer adjustments later. It is also stated that any effort put 
forth by the left hand causes a greater fatigue to the heart than the 
same effort put forth by the right hand and that this fact must bo 
taken into consideration in the training of the loft hand. A change 
of trade should be recommended when a pliysician^s examination 
shows too great a strain on this vital organ. 

Burnette (23) gives a brief but rather suggestive closoription of a 
method in use at tho Whitby Military Hospital. Tho apprentice is 
taught to trace on frosted slates; first, simple geometvio designs, 
sketches, pictures; then letters, and finally, whole words, The 
tracing automatically corrects tho back-slope habit which arises 
because the index finger obscures tho pupil’s work. After a fair con¬ 
trol over the left hand has been achieved, the apprentice’s work is 
varied by having him trace a design or word and then copy it on a slate. 
The next step is to copy on paper instead of on the slate. Individuals 
arc trained to write both legibly and rapidly within from 5 to 10 days 
by this method. 

In this hospital modeling in plastcrcinc has also been found to be 
an excellent means of producing deftness of movement with the left 
hand. Here, too, a left-handed man who had to be educated in the use 
of his right hand learned direction by playing ping pong. 

Superintendent John G. Kerr (85) of the Pilkington Orthopaedic 
Hospital, Lancashire, trains certain of the pupils who are learning to 
write with the left hand, to write backwards, that is, the last of the 
letter, the lost of the word, and the last of the line, first. Since only 
a few have been found, however, who are very successful in acquiring 
this back-hand writing, "mirror writing" is more frequently taught 
here. The writing is clone on a sheet of paper placed on a piece of 
carbon paper. On the I’cvensc side of the sheet may be read what has 
been written. "Mirror writing" is favored because; it is learned more 
readily than backhand writing; it is more satisfactory in respect to 
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uniformity, Icjsibility, and speed; and it is astonishingly alike in all 
VGspecta to the wi’iting which was done by tlic right hand. Superin- 
tondent Kerr offers no explanation of the reason why it is necessary 
for the left hand to write in an invorso direction to that taken by the 
right hand. 

A six weeks course (107) given by vawous schools in Germany 
to fit the man with only a non-preferred arm to take up the training 
for a trade, be.sidc others who arc less seriously lianclicappcd, includes 
instruction in the ordinary acts of life such a.s eating, dressing, tying 
knots, using simple tools, and writing. It is felt that at tins point a 
great part of the teacher's duty is to convince the men that all these 
ihinga are possible and need only practice to ho learned. In addition, 
practicQ in drawing, cle.signing, and modeling in clay, with the left 
hand is often added as a means of functional rc-odiication. 

In the dosoription of this course it is stated that certain Gorman 
teachers have made a scientific studj' of the question of left-hand 
writing, and that several text-books have been written on the subject. 
They are, however, unavailable to the present writer. 

Jules Amar (5) has made an exhaustive study of the problems of 
physical rohabilitatioji and functional re-oducation, and is consiclerocl 
one of the greatest present day authorities in this field. In a discus¬ 
sion of the mcaiifi for la formation des gaiichers" he points out the 
following helpful exorcises and says that their continued repetition at 
an increasing rate of speed guarantees an excellent training of the left 
hand within 5 or G weeks. 

With his left hand the patient should practice making blows with 
a hammer of one kilogram weight until ho is able, regardless of the 
height to which the hammer is raised, to hit with a sharp, qxiick blow 
a small piece of crayon placed on an anvil. A record of the patient’s 
progress showing the increase in rate of speed end amplitude of stroke 
may be obtained by means of a device described by tho author which i.s 
controlled by a pulley attached to tlio hammer with a small cord. The 
pupil should also practice until he has become skilful in bracing with 
his left hand an oval of 20 to 25 luillimctera diamefcov and a square of 
26 millimeters cut out in a sheet of copper. This copper sheet should 
bo placed on a paper and the edges of tho openings followed with a pen 
or stylus. 

Tho most pi'uctical and the most carcfulb'^ worlcod out system of 
training tho maimed man to write with his loft hand, however, which 
has yet reached us is the one Of Albert Charleux (27), a young French 
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tutor. After suffering at the age of 14 the amputation of his entire 
right arm ho learuecl to write by the method he describes. Since the 
beginning of tho war he has devoted its use to the instruction of the 
disabled soldier. 

He states that the left arm should not be expected to perform at 
first those small inovomcnts needed in writing. The first suppling 
of the arm and the first training in making the finer muscular adjust¬ 
ments should he obtained by practicing certain exercises—a series of 
straight, broken and circular lines—on the blackboard. The indicated 
direction of each line must be held. The number of lines and applica¬ 
tions in each exercise is not absolute. The apprentice can add more 
lines and devise more applications if he desires. At first the entire 
arm movement should be used. Tho movement should then be 
gradually restrained until the wrist movement is obtained. The 
exercises should be practiced in the order given, from the least to the 
most difficult, until the patient can easily, and to his own satisfaction, 
perform the entire series with the wrist movement. 

Writing on tho blackboard is taken up next. Here also the 
apprentice restrains the arm movement with which he begins until a 
movement of wrist and finger is obtained. The same copy is repro- 
d\iced in letters of four different sizes. The heights of the four sizes 
arc: 12-16 cm., 6-7 cm., 3-4 cm., and 1-2 cm. The work is continued 
until a legible and a regular copy of the smallest model is secured with a 
wrist and finger movement. 

The apprentice is now ready to begin writing on paper. Some 
directions are given him as to tho proper positions of his body and of 
the paper, tho correct size of the pen, and tho best method of holding 
the paper. A person with his right arm off above the elbow and no 
artificial arm has no natural means of support for his right side. He 
should be very careful to maintain his body in an upright position 
squarely in front of the desk. Leaning to the right side causes a low 
sloping shoulder and a posture which gives fatigue to the eyes, Since 
in writing ndth the right hand the paper is slightly inclined towai'd the 
left, in left-hand writing it should be inclined toward the right. A pen 
of medium size is best. The ordinary pen point is adequate as the 
beaks arc symmetrical. The pen should point over the left shouldci’ 
and should bo held loosely and cosily by the first three fingers. The 
paper can be most easily held in place by moans of a metal paper 
weight. It is advisable to have a handle placed near the top of the 
weight to facilitate its shifting. 
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The names by which cUITorent parts of the loLtcrs ai'c designated 
in tho French system of Imnthvriling are also given. Some suggestions 
arc set forth concerning the best angle of slantj the correct proportion 
between the difTcrent parts of the letter.^ the order in which the letters 
should bo learned and their siao in tho four typos of handwriting givoiij 
the Cur&iue, the lioiult —round, tho Balarde —running, and the 
GoUiique. Three different sizes of lettovs are given in each of the four 
typos and it woiikl .seem that the procedure of going from largo to 
small is to be carried out in wnting on paper as it is elsewhere, although 
no statement is made to this effect. The kind of movement which 
should be used is not definitely given but it is practically certain 
because of the emphasis placed in earlier exercises on the wrist and 
finger movement that this movement is to be finally attained. 

In most of the large number of recent and current publications 
which describe tlie work of various schools in training the one-armed 
men, cither the results arc described without any indication of the 
moans by which thoy wore obtained or no mothod is mentioned other 
tlmn that of “go at tho job and practice until you can do it,” This 
is true of writing as well os of other tasks. All Bpccifio suggestions 
£Uid methods for the education of the non-preferred hand and arm 
^s’hich were avallalolc have been described in the present study. Some 
excellent sovircos for this material, however, are at present unavailable. 

Tho following conclusions summarize brielly tho most important 
]70int8 brought out in tho preceding discussion of the methods for 
training the non-preferred hand or arm; 

i.. In training tho non-preferred hand or arm to perform any set of 
movements, the larger adjustments should come first and tho smaller, 
oj’ finer, lost. 

2. Practice in performing the thing to bo learned is clearly the 
method for the training of the loft hand and arm which is most in use 
at present. This method is without doubt very valuable in regard 
both to its practicality and to its efficiency in achieving results. 

III. Suggested Problems eou ISxpEUiMENTmoN 

1. The statements made by Jotcy'ko, Scluiyten, ICipiani, and 
Macnaughton suggest the need for additional evidence as to whether 
the training of the left band or arm to perform hilberto unaccustomed 
movements contributes to the cum of patients suffering from aphasia 
and other nervous disorder. 
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2. Tliei-e Jiaa been no attempt, to the knowledge of the present 
writer, to determine whether one hand is aided, judged from speed 
and accuracy, in learning to pei-form a given task by the fact that the 
other hand is skilled in that performance. Data on this question would 
doubtless prove of great value. 

3. Conclusive evidence aa to whether an amount of work done by 
the left hand causes more strain on the heart than the same amount 
of work carried out by the right hand would be valuable. Unusual 
difficulties in framing a suitable plan of experimentation, however, 
would likely present themselves. 

4. The problem of securing a means for predicting man’s ability 
measured by speed and accuracy, in learning to use the non-preferred 
hand, while perhaps loss important than some of the preceding topics, 
presents an excellent field for experimentation. 

5. Evidence additional to that given by Wallin and Ballard is 
needed to settle definitely the question concerning the relation of 
left-handedness and speech dcfecte. 
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THE CONSISTENCY SHOWN BY INTELLIGENCE RAT¬ 
INGS BASED ON STANDARDIZED TESTS AND 
THE TEACHER*S ESTIMATES 

J. E, WALLACE WALLIN 

Bureau or .Special Education ond Department of Clinical Psychology, 
Miami University 

The writer began -fco give Binet Teats and a dozen group tests of 
intelligence of his own construction in the latter part of 1909. The 
use of the Binet Tests after the 1908 revision became known spread 
like wild-firc throughout the countey and throughout the civilized 
world, but the use of group tests for measuring intelligence made no 
headway whatever until the United States was drawn into the World 
War. 

The rapid spread of the Binet Scale was due partly to the great 
practical utility of the scale, and partly to the extravagant claims made 
regarding it. According to the propagandists’ claims “Binet’s plan 
was perfect” for measuring “native intelligence;” the scale was a 
“marvel of accuracy,’^ so “amazingly accurate” as to admit of little 
or no improvement from, ages 5 to 12; the scale provided a well-nigh 
“infallible” means of determining whether a child was feeble-minded, 
which could be effectively used even by “novices,” or “untrained or 
wrongly trained persons, ” “nothing else” being needed for diagnosing 
“fceble-mindedness” in the gi'cat ma^ of cases than this test. After 
attempting to auberibe to such opinions as these for a year wholly 
devoted to clinical practice, the writer became thoroughly convinced 
that they could not be substantiated, and the burden of most of his 
papers and addresses during several years was to point out the nature 
of the exaggerations and of the limitations and defects of the Binet 
Scale with respect to its adequacy as an instrument for measuring 
“native intelligence,” with respect to the accuracy of the age place¬ 
ment of the individual tests and of the aggregate age standards, and 
with respect to the sufRciency of the scale for diagnosing mental 
deficiency, particularly in the hands of “amateurs” and especially by 
means of certain arbitrary standards of intelligence defect which had 
received almost universal acceptance, and which he was forced from 
personal and direct study of a great variety of cases to reject almost in 
ioio. 
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After over a decade of poignant criticism by numerous able writers- 
hero and abroad, the Binet scale continues to be used in various 
revisions as the best available Bchcmc of tests for determining the 
“level of intelligence” at least verbal “intelligence.” But nearly all 
qualified authorities now concede that even the latest editions of the 
scale are far from perfect either hi internal construction or in 
standardization. 

The history of tlio extension and use of the group tests of intolU- 
genee parallols closely the development and application of the Binet 
scale. 

We have been assured that “intelligence” can be just as accurately 
measured by so-callcd “group tests of inteUigcucc/’ as the depth of a 
well or a river can be measured by a physical measuring rod, and that 
children and adults can be accurately classified with respect to intelli¬ 
gence by group tests. One writer states that salijects from first grade 
level to university level can be accurately rated in intelligence by moans 
of his group scale alone, and that any first grade toachor can on tho 
first clay of acliool after a 20-minutc examination classify her pupils 
In regard to their intellectual ability, and section them accurately 
for the purpose of instruction. Wo have also boon told that intelli¬ 
gence can bo measured moro accurately by grouj) tests than by indi¬ 
vidual tests (tho Binet Scale). Usually these claims arc not based on 
moro assertion, pure ipse dixils. They have often been supported 
by the finding of liigh correlation coeflicieiits between tho group tests 
and the Binet Tests or some other criterion of “native ability” or 
“intelligence,” although it must be confessed that sometimes numerous 
correlations which have been exceedingly low or quite negative have 
been ignored or couveniontly forgotten. In many cases, however, the 
claims mode have been based upon the irresistible tendency of all 
propagandists and teat pubUshei'S to exaggerate, and of some test 
designers to advertise the superior virtues of their own wares. 

Claims of tins character are partly responsible for the amazingly 
rapid introduction of gi'oup intelligence testing into tho lower and 
higher schools and into the institutions for dependents or defectives 
throughout tho country, and the widespread praotice of classifying or 
sectioning pupils solely or partly on the basis of tho scores or IQ’s 
obtained from group intelligence tests. 

But tho history of the Binet movement is now repeating itself. In 
the wake of the period of exaggeration there has recently supervened a 
period of analytical examination, and ti'onchant criticism of the 
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assumptions, claims, values, validity, and uses of the group tests of 
intelligence. This phase of “reaction’' should be welcomed and 
encouraged instead of being ignored, belittled or resisted, as was done 
by some of the Binefc devotees in the early days of the examination 
and critique of the Binct Scale, who seemed to regard the scale as 
something immutable and sacrosanct. Practically all the early 
criticisms of the Binct Scale have been substantiated. It is only by 
searching and unhampered criticism that wc shall eventually be able 
correctly to appraise tho extent of the imperfections, the inevitable 
limitations, and the legitimate uses which may be made of existing 
group teats of intelligence. 

The following study of the agreement obtained in the intelligence 
rating of the first grade pupils in the Miami University practice school 
by the Pressey Primer, the Myers Mental Measure, the Detroit First 
Grade Intelligence Test, the Stanford Binet-Simon Scale and the 
teacher’s estimate, was carried out between February 24 and April 14, 
1922. One group tost was given on each Friday morning, while tho 
Binet testing was done on the three following Friday mornings, All 
of the group testing was done by my assistant, Miss Mildred Rothhaar; 
while the Stanford-Binct testing was done by six students who were 
taking the Psycho-Clinic Practicum offered by the Bureau, or by Miss 
Rothhaar or by myself. The students had had experience 3 hours per 
week in giving tho Stanford-Binct under critical supervision since the 
beginning of the school year. The scoring of tho gi’oup tests and the 
computation of the results were done by the students under the super¬ 
vision of my assistant, who checked over many of tho results, including 
all of the correlation coefficients which were computed independentlj'' 
by two students (on a Burrough’s Calculator). In the experience of 
the writer, correlation coefficients computed only once, cannot be 
implicitly accepted as accurate whether done by student or 
psychologist. 

The first grade critic teacher who estimated the intelligence of the 
pupils was a college gracUiate of considerable experience in the lower 
grades who had pursued various courses on intelligence and educational 
tests. She was asked to rank each pupil in the order of intelligence, 
assigning to each an arbitrary score on tho basis of 100 for the ablest 
pupil. She was asked to consider the various factors usually empha¬ 
sized in tho attempt to estimate children’s intclUgenee (namely, native 
ability or wit; demonstrated capacity or alertness or initiative, as seen 
in and out of school, presence of physical disease or defects, specific 
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mental disabilities; social and educational advantages or disadvan¬ 
tages; age). She was not given tlio reaults of the tests until heu esti¬ 
mates had been filed, but aho obsci’ved the pupils during the testing, 
and had available the results of prior lest results for a few of the pupils. 
She also ranked the pupils according to their proficiency in tUc.school 
work, but no use ia made of these estimates in tliis article. 

The maximum number of pupils tested by any one scale was 42, 
and by all the scales, 34. The comparisons ni'o ba.scd upon those 34 
(20 boys and 14 girls), whose average and median ages wore, respec¬ 
tively, G.8 and 6.6. The modal age was G.9. 


Theatmbnt of tiie Eesults 


1. Tlie correlation coefficient by the Pearson unabbreviated 
product moment method (r waacomputeclbetweenthescores 


of the different tests, and the teacher’s estimates and tlio test scores, 
Table I, In the case of the Binot both the absolute scores and the 
IQ's have been used, while only the absolute or raw scores have been 
used in the group teats. Our earlier analysis of the correlation coeffi¬ 
cients between the Binefc and Pressoy showed sharp discropanoies 
botween the r based on tlie IQ, and therbaaed on the absolute scores.^ 
Parallel discropanoies occur in. this investigation, as will be seen 
presently, the smaller coefficients being derived in both investigations 
from the use of the IQ figures. It ia evident that the conclusions 
drawn must be based on the raw scores, which voprosent the ultimate 
figures- Naturally the IQ values for the dilTcrciit tests are quite 
discrepant and incommensurable, os the tests are differently scaled. 
Tlxo Pressoy ia scaled on a maxium score of 100 points, the Potroit of 
60 points, and the Myers of about 140 points. 

2. The average difference in the rank order of the pupils, the range 
of the rank differences, and the percentage of rank differences exceeding 
10 steps, wore ascertained between the absolute and relative (IQ) 
scores in the tests, and the teacher's estimates, Table II. The figures 
representing the average differences am subject to a certain amount of 
uncorrcctable error duo to the existonco of idcnLical scores. 


^Wallin, J. E. Wivllftoo: A- Comparison of Thmo Mofchoda for Making the 
Initial Selection of Preaumptivo Monta) Defootivos. School and Socicly, 1021, 
pp, 31--45. 
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3. The porcentage of agreements was ascertained in the placement 
of the pupils in the first (highest), second, third, and fourth quartiles 
by the different tests and teachers' estimates, Table III. 

Hero, again, the percentages are subject to an error because of 
the existence of identical scores. There seemed to be no satisfactory 
way of eliminating tlieao errors, hencG the identical scores were 
arranged in rank order according to chance. 

4. The differences wei'O computed in the age-rating obtained by 
the pupils in the Pressey, Myers, Detroit and Stanford-Binet, Table 
V. Tentative norms for the Detroit Tests in terms of age and mouths 
were sent me by Miss Anna M. Engel, based on the examination of 
5039 children tested in June, 1921. For the Preesey and Myers, 
the scores were interpolated between the whole-year norms supplied 
with the tests. Extrapolated scores were used only when the values 
were merely slightly bolow or above the liminal values, 

In the case of the Presaey it was felt that the interpolated values 
would be move usable than the percentiles which accompany the 
norms. A margin of error necessarily results from the use of the 
interpolated and extrapolated norms. 


Statjsment or Experimental Eesults 
The Coefficients op Correlation 

Before analyzing the experimental data let us advert to the possible 
obieotioii that the corrolatioii figures obtained must necessarily be 
quite unreliable because of the limited number of cases. True. But 
the situation we present is absolutely typical. Teats arc constantly 
being given throughout the country to classes of pupils as actually 
constituted, be they large or small. If the tests have any value at 
all they must be of service in the classification of small classes as well 
as large classes. Moreover, tho reliability of the r's, especially those 
based on the Binot absolute scores, may be assumed from the PE's, 
Table I, to be quite high. The r's based on the Binet absolute are 
from about 5 to 10 timos as large as the PE's. 

Based on the Binot IQ the r’s range from 10, witli the Detroit, 
to 65, with tho teacher. With the Prossey the r is 15, compared with 
0.03 found by the Ayves abbreviated method in the previous investi¬ 
gation to which reference has been made. 



236 


The Journal of Educational Psychology 


Tabi-b I.—CoEFKiciBNTH OP Corublatiom (r) AND PnonABLB Error op r (PE) 
BETWEEN TJIB BiNBT, PlU^SaBY, MtERS, AND DETROIT TeSTS, AND THE 
Tbaciibr'h Rankino 



Binet 

nbaoliitc 

Binut 

IQ 

Pressey 

Mycib 

Detroit 

Teacher 

r 

PE 

r 

1*13 

r 

PE 

r 

1 

PE 

r 

PE 

r 

PE 






.15.S 

.112 

.531 

.083 

.105 

.114 


.006 






.456 

Rnlil 


.062 

.430 



.078 


ARR 


.158 

.m 


.... 


,009 

.608 


Ru!l! 

.008 



H 

.531 

Mnn? 



.... 


.404 


.667 

.070 





.114 







207 

.105 

Tcnchor. 

.487 

BSpi 


iHIM 

n 

n 

n 

.657 



n 



Tho figiirca in tlio gioup teats ropresont nwv or ftlwohito scores. 


Based on tho Binet absolute scores the r’s range from 0.44, with 
the Detroit again, to 0.08 with tho Myers. Myers has reported a 
much liighor correlation between hie scale and the Stanford Binet, 
namely "about 80 within each grade from the first to tlic eighth.” 
With this group of pupils tho correlation with the Pressey was only 
0.45, as compared to 0.73 which we previously obtained with a larger 
group of children who averaged older in age. 

In the case of the group tests (absolute scores), tlie lowest correla¬ 
tion is 0.38, between tho Pressey and the Myers, and the highest 0.50, 
between tho Pressey and the Detroit. The highest correlation for 
any single group tost is between the Myers and the Binet absolute, 
0.67, and between the Myei-s and the teacher's ranking, 0.56. 

Based purely upon the size of the r obtained from the teacher's 
ranking, the Binet and the Myers rate the pupils the most accurately, 
while the Detroit is decidedly the most inaccurate, the difference 
between’ the Binet absolute and tho Detroit amounting to 0.19. Based 
purely upon the size of the r obtained from the Binet absolute scores, 
the Myers ranks highest, while the teacher’s judgment is superior 
to tho Pressey and Detroit. Tho difference between tlie Myers and 
the Detroit amounts to 0.24. 

While these correlations cannot be considered to be very high, 
with possibly one or two exceptions, the analyses which follow seem to 
force ua to the admission that, small as they are, they are nevei’tholess 
fictitiously high, possibly because of tho almost consistent tendency 
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Taulb II.— DrpPBRBNCHs in Rank Oudbr 
Between Rankings Based on Absolute Scores 


Tosl.s compared 

Average 1 
rank 

diflcrenee 

Range of rank 
dlffercncoa 

Per cent 
of rank 
differences 
exceeding 
10 stops 

Prossey and Myers. 

■■ 

From 0 to 22 

41,1 

Pressoy and Detroit. 


From 0.6 to 17 

32.3 

Myers and Detroit.. 


From 0 to 20.5 

38.2 

Binct and Prcaaey. 


From 0 to 28 

20.4 

Binet and Mycra. 


From 0 to 22 

11.7 

Binet and Detroit. 

7.70 

From 0 to 22 

20.4 

Teacher and Pressoy. 

8.7 

From 0 to 28 

41.1 

Tcachor and Detroit. 

0.04 

From 1 to 26.6 

47.0 

Teacher and Myers. 

8.06 

From 0,6 to 25 

38.2 

Teacher and Binot...i 

Average of throo group tests and 

8.23 

From 0 to 21 

38.2 

Binet.. 

7.66 

From 1 to 26.6 

20,6 


Between Rankings Based on Relative Scores (IQ’s) 



0,07 


38 2 


6.88 


23.5 


0.63 


38.2 


7.17 

From 0,6 to 22,6 

23.6 


8.20 

20.4 

Mycra and Detroit.. 

8.03 

From 0.6 to 24 

32.3 


Between Rankings Based on Binet Relative (IQ) and Group Absolute Scores, or the 

Toaclicr's Rating 


Binet and Myers.... 

7.79 

From 0.5 to 22 

29.4 

Binot and Pressoy.. 

9.04 

From 0.6 to 31.5 ' 

44.1 

Binot and Detroit...... 


From 0 to 25 

60 

Binot and teaclior. 

IH 

From 0 to 22 

36.3 


oi the Binot to rate the pupils higher than the group tests. Thus 
29 pupils arc rated higher by the Stanford than by the Myers, while 
the reverse is true for only one cose. The total excess rating given 
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l)y the Billet for the 29 pupils amounts to 43.5 ycar.s, compared with 
an excess of 0.2 year for the one case wliich is rated higher by the 
Myers. (See Table V for similar data relating to tho other tests.) 
When one scale grades consistently high and another scale consistently 
low, wc may obtain a high correlation eoefRcicnt Ijctweon them 
although there would bo a pronounced age difforcnco in tho rating 
between the two scales, assuming that tho dirferonccs were large. In 
other words, a high correlation coefTicicnt docs not demonstrate the 
accuracy of the ratings obtained by a comparative scale in terms of a 
standard of accepted or demonstrated accuracy. As I have previously 
remarked, “the coiTelation coefficient has often, in my experience, 
proved to be a clever device for concealing the true state of affairs— 
a fictitious refuge in our Search for security. . . . Plus and minus 
deviations arc not neutralized or concealed when individual cases 
are comparecl . . . Even a high correlation coefficient cannot 
domonstrato the accuracy of a particular measurement. 

AonEEMENT IN RaNK OuPHU 

Tho tests are so paired in Tabic 11, as to permit of 21 comparisons 
of the rank differences in tho placement of the pupils by the different 
testa or by tho toachor and the tests. Tho analysis will be restricted 
to few of tho comparisons contained in tho table. 

By examining tho second column, “range of rank differences," 
which gives tho smallest and the largest differences found in the rank 
placement of any given pupil by two tests (based on both the absolute 
and the relative scores) or by one test and the teacher's score, it will 
be seen that some pupils were assigned exactly the same rank order, 
while the maximal differences in tho ranking varied, in the different 
comparisons, from 17 steps to 31.5 steps out of 33 possible stops of 
difference. To illustrate: The maximum difference in the ranking of 
the same pupil, based on the absolute scores, amounted to 22 steps 
between Pressoy and Myers, Binetand Myers, and Binet and Detroit; 
and to 28 steps between the teacher and Pressey, and 21 stops between 
the teacher and Binet. Based on tho relative scores it amounted 
to 25.5 steps between Pressey and Detroit, and 26 stops between 
Binet and Pressey. Between tho Binet relative and the Pi'esscy 
absolute scores it reached 31.5 steps. The Pressey ranked this 


1 Wallin, J. E. Wallace: Tho Tlicory of Differontial Education as Applied to 
Handicapped Pupils in tho Elomentnry Urndcs. Journal of Educational Research, 
1022, pp. 200-224. 
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pupil 31.6 steps higher than did the Binet. The smallest maximum 
range in any of the eomparisona is found between the Pressey and 
Detroit absolute, 17 steps, followed by the teacher and Binet absolute, 
21 steps. 

The last column gives the ratio of subjects whose rank differences 
by any two tests exceeded 10 steps. ' The smallest percentage is for 

C«rr»«p^ 4 M« M«»*b ik« 0rd«r la *hl«b 
ih« Faplli tluk«l b; BlB«t BBd ib* 

ArtrBf* tf ib« Art* dreap (Abielut* 
a«or**}. 


Oroup Blati 
t«H T*«t 
Riiik Rask 



the Binet and Myers (absolute scores), followed by the Binet and 
Myers and Binet and Detroit (relative scores). In the case of the 
Binet and Myers absolute, 11.7 per cent of the pupils differed by more 
than 10 steps; in the case of the Binet and Myers and Binet and Detroit 
relative scores the corresponding figure was 23.6 per cent. When 
the Binet relative and the Detroit absolute scores, and the teacher’s 
and Detroit absolute scores are com’pared, the rank difference exceeds 
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10 steps tor 50 and 47 per cent, respectively^ of all the pupils. When 
the Binet and average of the 3 group tests arc compared, the difference 
in the ranking exceeds 10 steps for 20 per cent of the pupils. 

Based upon the averages in the first column, the smallest average 
difference in the ranking of all the pupils by any two tests amounts to 
5 stops, between the Binet and Myers absolute scores, while tlie largest 
amounts to 10.5 steps, between the Binet relative and Detroit absolute 
scores. The average difference amounts to over 7.0 slops in all except 
four of the 21 comparisons. The average rank difference between 
the teacher’s rating and the Binet IQ is 0.8 stops, and between the 
teacher's rating and the Binet absolute 8.2 steps. Based on the Binet, 
the smallest average difference is given by the Myers and Detroit, 
the average diffcrenco in rank, order based on the absolute scores 
being 6.0 and 7.76, respectively, and based on the relative scores, 
6.88 and 7.17, respectively. If the group tests are averaged the 
difference with the Binet absolute still amounts to 7.5 steps. 

The difference in the ranking of the pupils by two tests appears 
strikingly in the graph, which gives the difference in the ranking of 
each subject between the Binet and the average of the three group 
tests. 

The extent of the disagreements between the different tests is 
also shown by an analysis of the 

Agreement in Quartile Grouping 

If the teacher were to section the pupils into 4 groups according to 
teat ability, she would find the greatest amount of agreement in the 
two extreme qiiartiles, containing tho best and the poorest pupils. 
Table III. But even so, wo find that in only 2 of tho 10 possible 
comparisons between the rankings of two tests, or of one test and the 
teacher’s rating, is there agreement on more than half of the pupils 
who should be assigned to the highest or the lowest quartile. Pressey 
and Detroit agree on 60.6 per cent and tho teacher and Myers on 
65.6 per cent for the highest quartile; and the Binet IQ and tho Myers 
on 62.5 per cent and the teacher and Binet IQ on 75 per cent, who 
should bo assigned to the lowest quartile. Tho lowest agreement is 
between the Binet IQ and Detroit which agree on only 12.5 per cent 
of tho pupils for lowest quartile. Altogether tho highest agreement is 
between the teacher and the Binet IQ, 44.4 and 76 per cent for the 
first and fourth quartiles; and possibly between the teacher and Myers, 
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Table III.— AanBBMBNTs Found Between tdb Gbodp Tests in the Quautilb 
G nouPiNQ OP THE Children 


Teats 

Per cent 
placed in 
first 

(highest) 

quartile 

Por cent 
placed in 
second 
qunrtilo 

Per cent 
placed in 
third 
quartile 

Per cent 
placed in 
fourth 
quartile 

By Preaaoy and Mycra. 

33.3 

12.6 


37.0 

By Preaaoy and Detroit. 

60.6 

12,6 


25.0 

By Myora and Detroit... 

33.3 

25.0 


37.0 

By Binot and Myers. 

33.3 

12.5 

33.3 

62.5 

By Binot and Detroit. 

44.4 

12,5 

22.2 

12,6 

By Binet and Preaaoy. 

33.3 

25.0 

11.1 

37,0 

By teacher and Binet. 

44.4 

12.6 

33.3 

75.0 

By teacher and Myers. 

66.6 

26.0 

11.1 

60,0 

By teacher and Preasoy. 

33.3 

26.0 

22.2 

60.0 

By teacher and Detroit. 

44.4 

26.0 

11.1 

26.0 

By Preaeoy, Myora and Detroit... 

22.2 

0 

0 

25,0 

By Preasoy, Myora, Detroit and 
Binot. 

0 

0 

0 

12.6 

By Presaey, Myora, Detroit and 
teacher. 

11.1 

0 

0 ■ 

26.0 

By Pressoy, Myers, Detroit and 
Binot IQ.... 

0 

0 

0 

5.8 

By average of thro© group tests 
and teacher. 

8.8 

8.8 

6.8 

11.7 

By average of three group tests 
and Binot...... 

8.8 

5,8 

5.8 

8.8 


Baaed on the 34 pupils who were given all the teats. Nino pupils were assigned 
to the iirst quartilo, 8 to tho second, 0 to the third, and 8 to the fourth. 

The absolute scores are used in the group tests and the relative (IQ) in the 
Binet. 

65.6 and 50 per cent, respectively. Using the Binet as the standard 
of reference, the teacher and the Myere show the greatest amount of 
agreement, 

Tho three group tests do not agree on more than 25 per cent of 
the cases; the three group tests and tho Binet IQ on more than 12.5 
per cent, and tho three group tests and the teacher on more than 
25 per cent all in tho fourth quartile. The 3 group tests, the Binet 
IQ and the teacher agree on only 5.8 per cent of tho cases in the fourth 
quartile; tho average of the 3 group tests and the teacher on only 
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8.8 per cent in the firat and 11.7 pci* cent in. tlio fourth quartilcs; and 
the average of the 3 group tests and tlic Binet IQ on only 8.8 per cent 
in the first and fourth quartilcs. (The figures differ only slightly 
in these two comparisons if the Binet absolute scores are substituted.) 

Turning to the middlo quartiles, wc find that in only three of the 
ten Gorapariaona between two teats or bctwcou one Lost and the teacher 
is there agreement on as many as one-third of the pupils who should 
be assigned to either quartile, namely between tho Pressey and Detroit, 
tho Binet and Myers, and the teacher and Binet, all in the third 
quartile. In about half of the comparisons tho tests agree on less than 
13 per cent of the pupils. None of the tests possess any distinct 
superiority in selecting the pupils for tlio second and third quartiles, 
but the agreement with the Binet is greatest for Myers and the 
toaohor. Neither tho three group tests, nor tho three group tests and 
tho Binet, nor the throo group tests and tho teacher agree on a single 
child (or tho middle quartilcs. Tho average of the three group tests 
and tho Binet agreo on only 5.8 per cent of tho pupils, for tho middle 
quartiles, and the average of the three group tests and the teacher 
on 8.8 and 68 per cent respectively, for the second and third quartiles. 

Tho disagreements appear equally evident when tlie, scores are 
translated into 'age-vatiugs and critically analyzed. 

AanuBMENi' IN Intelligence Age-ra.tinq 

‘When the average intelligence ages by tho different tests are 
compared in Table I V, there is, indeed, fair agreement. The average 

Table IV. —AvianAOH Intellioencb Aob Accobdino to tub Dippshent Tests 

Binet Detooit Pbbbbby Myers 


Number of oftsos. 34 34 20 30 

Avornge ago. 7.0 0.0 0.7 0.2 


Binot intelligence age is 7.0 year, which is 0.2 years higher than tho 
average and 0.4 year higher than the median chronological ago. The 
Detroit gave practically tho same result, while tho Presaoy and Myers 
ratings were 0.3 and 0.8 of a year lower. Tho difforonccs between 
the Binet averages and tho avorages in the group tests are negligible 
except in one case. 

But the extent to which the individual intolligonco agc-ratinga 
obtained by the tests agree or disagree cannot bo inferred from the 
average intelligence ages, but only by computing tho differenco in 
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the age-ratings obtained for each pupil by the different tests. Table V 
gives the range of differences in decimals of a year between the age- 
ratings obtained by the subjects in the two teats compared; i.e., the 
range from tho subject who showed the smallest age difference to 
the one who showed tho largest ago difference in the tests compared; 
the percentage of cases who differed by 2 years or more and by 1 
year or more in tho intelligence age-rating obtained in tho two tests 
compared; and tlie number of subjects who rated higher in each of 
the tests compared, together with the total and the average amount of 
difforencG in tho intelligence rating in terms of years. This table 
demonstrates with peculiar force how widely discrepant the verdict 
of two tests maybe on the same pupils. Thus the maximum difference 
in the rating of any pupil in this group amounted to 3.1 years as 
between the Myers and Stanford, 2.8 years as between the "Detroit and 
Stanford, and Pressey and Stanford, 2.5 years as between the Pressey 
and Myers, 1.9 as between the Pressey and Detroit, and 1.7 as between 
the Myers and Detroit. These figures, of course, represent the 
extreme cases. Nevertheless tho table shows that the rating differs 
by 2 years or more for 24 per cent of the pupils when tho Pressey. 
and Myers are compared, for 16.4 per cent when the Pressey and Stan¬ 
ford ore compared, and for 13.3 per cent when the Myers and Stanford 
are compared. Tho difference amounts to 1 year or more for 83.3 
per cent of the cases when the Myers and Stanford are compared, 
for 29.4 per cent when the Detroit and Stanford are compared, for 
43.3 per cent when the Pressey and Stanford are compared, and for 
23 per cent when tho Pressey and Detroit are compared, whioh showed 
the highest agreement from the point of view of this criterion. When 
test results differ by 1 year or more on from 23 to 83.3 per cent of 
children who average only 6.8 years of age the suspicion is irresistible 
either that the norms are inaccurate, or that the tests are imperfect or 
worthless, or that the tests measure different things, or that the 
measurement of human intelligence is, fundamentally, so intricate 
and difficult that it cannot be accurately done by group tests. It 
was my conviction that the accurate measurement of intelligence by 
group tests would be far more difficult in the ease of young children 
than in the case of older oncs.^ But this seems not to be true, judg¬ 
ing by tho results of Stenquist based on five group tests of intelligence 
given to a class selected at random from the upper elementary grades 


^School and Society, 1921, p, 34. 
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of a school in New York City,* and by the results of four intelligence 
tests given to the sixth, seventh and eightli grades in the Miami 
Universit}’’ practice school. The difference between the ingclligonce 
age-rating obtmnod in. a single test compared with the average of 
all five tests varied from 2.08 years to 2.68 years for over 18 per cent 
of the New York pupils. The corresponding differonces between 
the rating obtained from the individual scales, Jiaving regard for the 
plus and minus deviations, varied from 3.10 years to , 4.08 years. 
The median difference in the intelligence age-rating of the upper grade 
pupils in the Miami practice school obtained by the Stanford-Binet 
and tho Illinois Examination (the only scales with directly comparable 
norma) amounted to 11 months, varying from no difference to 3.8 
years. The differences ranged from 13 months to 3.8 years in 43.8 per 
cent of the coses and from 2 years to 3.8 years in 17 per cent.* 

The results obtained by Stenquist, Guiler, and myself fi'om experi¬ 
mental investigations of tho amount of disagreement found between 
different tests are quite disappointing and must seriously disturb the 
confident belief, so generally accepted at present, that group tests give 
an accurate measurement of “general intelligence’' and a highly 
reliable and accurate moans of sectioning pupils according to their 
ability for the purpose of instruction. If this belief is justified the 
conclusion seems unavoidable that tho t<ats used by tho abovo three 
writers (and the tests used by other workera which show largo discrep¬ 
ancies) measure qualities which ai-e so different as to be practically 
incommensurable. 

At any rate, so long as different tests which arc a.ssumcd to measure 
the same thing [e.y., “general intolHgcncc,” or “verbal intelligence,“ 
or “non-verbal intelligence,” or “native ability,” or “alertness”) 
give glaring discrepancies in the rating of any considerable number of 
subjects, we cannot evade the issues: Is it possible to obtain accurate, 
reliable measures of mental traits by means of group tests? If so, is it 
possible to obtain such a comprehensive measure of “general intelli¬ 
gence” by a brief group test based on the tests of a very limited number 


‘Stonquiafc, John L,; UnroliaWlily of Individinil Scores in Moiitol Mcasiire- 
incnta. Journal of Educational Research, 1021, pp. 3d7-3Q4. 

* Theso figiii'cs are btisccl on a table supplied mo by Ouilcr, who has since 
published an analysis of the dieaBreements of tho tests baaed on tlio IQ’s: Guiler, 
Walter S. t How Different Mental Teats Agree in Hating Children. The ISlcmcn- 
(ary School Journal, 1022, pp. 73dr-744. It is unfoi-tunnto that the writor did not 
analyze the results for tho absolute scores also. 
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of traits arranged in an artificial setting, as will accurately and ade¬ 
quately reflect intelligence as it manifests itself in varying degrees of 
successful adjustment to the work of the school, mart, factory, farm, 
office or playground? If so, do the existing tests alleged to measure 
“general intelligence” (or “verbal” or “non-verbal” intelligence, as 
the case may be) give ua accurate measures of intelligence viewed in 
this broad, comprehensive way? Or do they give us measures of 
different mental qualities or traits and of very limited aspects of 
intelligence as a whole? The solution of questions such as these is, 
in my judgment, now more important than the devising of new 
teats or the wholesale application of tests. 
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It was not possible to compare the subjects whose age-rating could not be determined because of the lack of age nonns for 
certain ages. 













THE CONSTANCY OF INTELLIGENCE QUOTIENTS 
WITH BOEDERLINE AND PROBLEM CASES 

V. A. C. IIENMON, 
tjniveraity of Wiaconsin 
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HELEN M. BURNS 

Supervisor of Special Classes, Madison, Wisconsm 

The oonstaiicy of intelligence quotients is a matter of such practical 
moment that significant data should be promptly reported. This 
is particularly true of borderline cases and such others as, for one 
reason or another, arise in a school system and are referred to the 
supervisor of especial classes or other examining agency for diagnosis. 
The cases whose intelligence quotients lie between 60 and 80 or between 
66 and 75, present a peculiarly troublesome problem, to the diag¬ 
nostician who is expected to make disposition of them or make recom¬ 
mendations on a basis of which proper disposition can be made. The 
reliability and constancy of the intelligence quotient in these cases 
is of very special importance. For all we know the validity, reliability 
and constancy of tests may be greater or less with this group than with 
those very definitely defective or definitely superior. Few reports of 
those that have recently appeared, deal specifically with this group.^ 
This study deals with 72 pupils who have been referred to one of the 
writers, Miss Burns, Supervisor of Special Classes in the Madison 
Schools, and who have been retested one or more times. All of the 
examinations have been made either by Miss Burns or by Dr. Elizabeth 
L. Woods, State Supervisor of Special Classes. Both are trained and 
experienced examiners. In 59 of the cases, both examinations were 
made with the Stanford Revision. In 18 cases, the first test was 
made with the Goddard Revision of 1911 and the second with the 
Standard Revision. The distribution of the 77 cases by intelligence 
quotients is as follows: 


‘ See summary of 8 studies by Rugg and Colloton, Constancy of the Stanford- 
Binot IQ as shown by retests. Journal of Educational Psychology, Vol. XII, 
No. C, September, 1921; and subsequent articles in the same iournal by Wallin, 
Vol. XII, No. 8, October, 1921; by Stenquist, Vol. XIII, No. 1, January, 1922; 
and by Gordon, Vol. XIII, No. 6, Scpt^bcr, 1922. 
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Table 1 


IV 

Stanford 

Jirst test 

Stanford 
second test 

Ooddard 
first test 

Stanford 
second test 


1 

2 

1 

1 


0 

4 

0 

1 

00- 70 

14 

17 

3 

6 

70- 80 

14 

15 

5 

2 

80- 90 

11 


3 

4 

90-100 

7 

0 

2 

2 

100-110 

G 

5 1 

1 

2 

110-120 

0 

0 1 

3 

0 

Median IQ..., 

70 

73 

78 

73 


This indicates tho nature of the coses studied, a large share of 
them having intelligence quotients between 0.00 and 0.90. Some of 
them arc above 100 but we included since tlicy were roferrod for 
examination. 

The constancy of the intelligence quotients may be shown in the 
usual ways, (1) by the correlations between tho first and second 
tests, (2) by tho average or median dilTcronce between the test and 


TAni.E II.—ConnEi.ATioN Table pou 00 Stanforb Kkvision Ristbbtb 
(IQ, Second Tost) 
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retest, and (3) by the limits of difference which would include 60 per 
cent of the cases. 

The correlation table and coefficient of correlation for the 59 
cases tested both times by the Stanford Revision appear in Table II. 

The coefficient of 0.91 agrees well enough with those previously 
reported. Terman with 428 cases found a coefficient of 0.93, Cuned 
and Terman for three groups of 25, 21, and 31 cases found coefficients 
of 0.96, 0.94 and 0.85 respectively; Rugg and Colloton for 137 eases 
report a coefficient of 0.84; Gordon for 44 cases reports a coefficient 
of 0.84; while Stenquist for 274 coses found a coefficient of 0.72, In 
view of the considerable differences in the heterogeneity of the groups 
and hence the differeucea in variability, these coefficients are not com¬ 
parable. The determination of the average differences gives a better 
basis of comparison. 

The average <Iifferencef or the 59 cases is 5.3 points IQ. For 
the 18 cases given the Goddard Revision at the first test and the Stan¬ 
ford Revision at the second test, the average difference is 9.0 points 
IQ. All but 3 of the 18 oases in the second group show a loss. In 
Rugg and Colloton’s summary table there are two reports in which 
the Goddard revision was used in the first test and the Stanford 
Revision in the second. Garrison with 62 cases found an average 
difference of 4.66 points IQ, while Wallin for 120 cases found a differ¬ 
ence of 10.2 points IQ. Obviously our results are in much closer 
agreement with those of Wallin. 

The average difference where the Stanford Revision was used in 
.both tests reported by Terman as 4.5 PE, by Garrison as 4.66, by 
Poull as 4.0, by Rugg and Colloton as 4.7, are somewhat less than 
the 5.3 points IQ which we have found. Our results in turn show 
greater constancy than those of Gordon 6.8, Wallin 6.1, and Terman 
and Stenquist 7.5. 

By the formula = .6745(r Vl “ ^’ 12 , with a coi’Z'elation of 
0.91 and a standard deviation of 15.6 the probable error of measure¬ 
ment is 3,15 points. This agrees with the determinations of Otis, 
Terman and Rugg who have found that the PE in terms of IQ is about 
3 points. 

The range of difference for the Stanford Revision cases runs 
from — 13 points to -f- 15 points. The limits within which 50 per cent 
of the cases lie fall at — 6 points and -b 3 points. In this respect, our 
results are very different from others. Rugg and Colloton state 
that 'Tor ail studies the positive differences are nearly twice as 
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large as the negative differences.” In our results this is not the case. 
Out of the 59 cases, 31 show a loss in the retest, the average loss being 
6.3 points, while 24 cases show an average gain of 5.9 points with 
cases yielding an identical score. The median change is a loss of 
1.75 points IQ and the average difference 4,8 points. The typical 
result to be expected with sxich a group then is a loss rather than a 
gain. This appears more markedly in the 28 cases whose first tests 
gave IQ’s between 6Q and 80, Twenty of the 28 cases show an average 
loss of 5.2 points, while 8 show an average gain of 3.7 points. The’ 
median change with this group is a loss of 3.7 points. It is well 
established that with feebleminded subjects the IQ’s tend to decrease. 
This is the evident tendency with borderline cases also and it is a very 
significant fact in making provisions for them and predictions con¬ 
cerning tliem. 

Two other points remain to be noted. The effect of the length 
of the interval between the first and second tests for those tested 
in both instances by the Stanford Revision is shown to be as follows: 


Iktervai.' 

Nvubkr op Cases 

Aybiiaob Dippsrbnob 

9 months-1 year 

4 

7,0 points IQ 

1 yeac-lH years 

21 

4.1 points IQ 

IJj^yeats-^ years 

9 

E. 1 points IQ 

2 yeara-2M years 

20 

5.2 points IQ 

21^year8-4 years 

_9 

0.2 points IQ 


Total. 69 

Av. 5,3 


It is evident that within the rather narrow time limits here 
involved, any effect of the length of interval is not revealed. The 
number of oases is, of course, too small to show any general tendency.. 

The sex difference noted by Gordon* recently is not in evidence. 
The average difference for 34 boys and 25 girls is the same, viz., 5.3 
points. In the case of the boys, 19 differences were positive and 12 
negative. In the case of the boys 12 differences were positive and 
12 negative. In Gordon’s results ^most of the losses were with the 
girls and most of the gains with the boys.” 

In summary our results for 59 cases, with intervals from 1 to 4 
years, give a correlation of 0.91 between the first and second tests 
and an average difference of 5.3 points. The probable error of measure¬ 
ments is therefore about 3 points IQ. The significant point brought 
put by the study is the evident tendency with borderline cases, for 
intelligence quotients to decrease, a tendency most marked in the 
■ group whose initial IQ’s lie between 60 and 80. 

: • ‘ Gordon, Knto: Some retests with the Stanford-Binet Scale. Journal of 

Edik^ationol Psj/c/volopy, Vol. 13, No. fl, Soptomber, 1922, 




NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 
EDUCATION 


CONDUCTED BY LAURA ZIRBES' 


1. The Status and Future of ‘*Behainonslic” Psychology. —There is 
and, for some, has been a need for a thorough and scholarly study of the 
history and present status of “behaviorism” as a movement in 
psychology. It would be expected of anyone undertaking this task 
that nothing short of a penetrating analysis of the growth of many 
“schools” or “system’’ not only in psychology, but in philosophy, 
biology and other related subjects, would be completed before emphatic 
pronouncements would be made. Roback*^ has exceeded expec¬ 
tations in point of courage if not in matters of thoroughness in his 
recent book. He has given Watson “the enfant terrihh of behavior¬ 
ism” what he considers to be a well deserved spanking, and he has not 
spared the rod in other cases. But perhaps many readers will feel that 
Eoback has been motivated by an impatience that led to the administer¬ 
ing of punishment before listening to the whole story of the offenders 
or without considering evidence that other interested parties might 
have contributed. He ha.s treated behaviorism as a menace against 
which a sharp polemic attitude must be taken. Perhaps many who 
are, like the reviewer, unsympathetic with extreme behaviorism will 
nevertheless feel that such an attitude is unwise even if not altogether 
futile. 

Roback’s book is based mainly on a series of controversial articles 
and a few books, rather than upon an analysis of fundamental treads 
represented in the work of recognized leaders, past and present. For 
example, in treating the “antecedents of behaviorism” no mention is 
made of Judd's writings on motor attitudes, Thorndike's discussions 
of the evolution of the mind and such an extensive theory as those of 
Washburn in Movement and Mental ImageJ'y, are barely mentioned. In 
treating the “Varieties of Behaviorism” and other topics, we are given 


‘ Unsigned reviews prepared by Laura Zirbes. 

^Roback, A. A.; ''Behaviorism and ftychology.” Cambridge, Univeralty 
Bookstore, 1923, pp. 284. 
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a series of quotations, some of which may have been unwittingly made, 
plucked from various sources, and intei’spersed with comments after 
the fashion of the Literary Digest instead of a comprehensive view of 
movements within tho science itself. 

Part II of the book is a direct critical attack upon some of the 
principles of behaviorism. The inadequacy of the Watsonian accounts 
of memory and other types of learning and activity are presented 
together with a less effective because less important attack upon the 
logic of particular behaviorist. A lai’ge portion of the book—all of 
Part III which is the longest, and a section of Part II—is devoted to 
discussions of the imcompatibility of behaviorism and philosophy, 
ethics, jurisprudence, medicine, religion, and the demands of life. We 
are told that "Watson's substitute for thought is untenable in (the) 
eyes of (the) law;” that the "religious consciousness is not reducible 
to non-mental components,” etc. The behaviorist is not likely, 
however, to be frightened from hia position by these or other practical 
difficulties in making himself understood. The book includes a bibliog¬ 
raphy of books and articles bearing on the “issue between behaviorism 
and psychology.” It includes an appendix on “Intelligence and 
Intellect” and another on “How is Psychology Defined?” that seem 
to have no significant relation to the main body of the book. 

Psychologists will find in the book many admirable passages, a 
useful list of references, and an interesting classification of types of Pre- 
Behaviorisms, Behaviorisms Proper, Psycho-Behaviorisms, and Nom¬ 
inal Behaviorisms. The elementary student will, of course, find such 
a book unintelligible. A. I. G. 


2. The Effectiveness of Visxuil Instruction .—We need scientific data 
to ascertain just how much the effectiveness of instruction is en¬ 
hanced by the use of certain visual aids. Such auxiliary materials and 
methods need to be compared and evaluated with reference to sound 
educational criteria. The following phases of this problem were 
studied in an investigation of the learning of seventh grade pupils:^ (1) 
The effectiveness of informational moving pictures in combination 
with verbal instruotion; (2) The value of simple drawings in creating 
composite visual images; (3) The value of diagrams in developing 

'Weber, Joseph J.: "Comparativo Effectiveness of Some Visual Aids in 
Seventh Grade Instruction.” Chicago, The Eciiientional Screen, Inc,, 1922, 
pp. 131, 
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relatively abstract concepts; (4) The comparative effectiveness of 
four different methods of presentation, viz.: Study of the printed 
page, oral instruction by the teacher, silent observation of a film, and 
observation of a film, accompanied by a lecture or explanatory remarks. 
The conclusions are based on the scores obtained by about 500 pupils 
on 3 kinds of tests. The differences in effectiveness are noticeable but 
so surprisingly small as to seem almost insignificant. An investigation 
into pupil preferences shows that pupils’ choices favor the use of films' 
We wonder whether the effects of various methods of presentation do 
not vary much more with reference to permanence of impressions and 
whether an investigation should not consider that possibility. The data 
presented in this study show that visual aids in the form of pictures or 
diagrams are more necessary when the material is foreign to the pupil’s 
actual experience, or abstract in its nature. Purther investigations are 
needed and some should be carried on with younger children. 

It is too bad that a study manifestly concerned with the effective¬ 
ness of visual aids should neglect to provide in its published form so 
many of the visual cues by means of which readei's of such studies are 
saved time and trouble. 


3. The Biology and Psychology of Child Nurture ,—This book is a 
plea for the more intelligent understanding of the interlooWng relations 
of heredity and environment, and an application of such knowledge to 
the related problems of eugenics, prenatel care and child culture.^ It 
is written from the standpoint of a physician, one who has a vision of 
the tremendous social significance of the conservation of childhood. 


4. The Scientific Metliod and Education .—This monograph^ is 
largely devoted to an expository treatment of the methods of the scien¬ 
tist as distinct from the methods of the scholar. This is to furnish the 
background for training the student in the “scientific method of 
interrogating nature.” One wonders to what extent such a recon¬ 
struction of the historical scientific method results in an overestimation 
of the rdle played by inductive-deductive thinking as a conscious 

* Chapin, Henry Dwight; “Heredity and Child Culture." New York, E. P. 
Dutton and Company, 1922, pp. viii -H 219. 

“Sanford, Pernando: “How to Study, Illuatrated Through Physics." New 
York, The MacMillan Co., 1922, pp. vi + 66. 
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organizing agency, and undoreBtimates the extent and significance of 
trial success and accidental discovery. Moreover the reader whose 
expectations have been raised by the attractive title cannot avoid 
being disappointed at the omission of any concrete approach to the 
classroom situation. Without minimizing the value of this contribu¬ 
tion, we can maintain that there is a real need at this time, not so 
much for a further definition and analysis of the scientific method, as 
for at least a preliminoiy experimental excursion into the identifica¬ 
tion and evaluation of scientific method as it can be expressed in the 
specific activities and materials of the classroom. 

J. G. ICtlDEENA, 


6. Invesligalions of Typemiting and Stenography .—In the first of 
these two monographs^ stenographic ability is analyzed into its im¬ 
portant corapononta ov functions as a first step in the construction 
and standardization of scales for the measurement of achievement in 
shorthand. The technique of scale construction is evidence that the 
author is thoroughly conversant with the history and present status 
of educational measurement. The reasons for the selection of certain 
test elements and procedures, and the rejection of others are interest¬ 
ingly stated. The aeries of tests includes an ingenious and original 
test of reading ability in which comprehension is checked while rate 
is measured. There are two forms of a test for speed of writing and 
a 16-step scale for measuring the quality of shorthand penmanship, 
which does credit to its forbears. Each of tlie ten equivalent vocabu¬ 
lary tests consists of one hundred common words and fifty common 
phases, the selection of which was based on four well known vocabulary 
studies and a comprehensive count of phrases or word groups made 
by the author. This phrase study is significant, apart from its 
immediate purpose. There is also a scale, similar to the Ayres Spelling 
Scale, for measuring knowledge of shorthand word and phrase charac¬ 
ters or outlines. 

In the same series is a monograph which deals with the improve¬ 
ment of speed and accuracy in typewriting.* Four vocabulary 

iHoke, Elmer Rhodes, Ph.D.: “Tho Measurement of Achievement in Short¬ 
hand,” The Johns Hopkins University Studies in Education, No. 6. Baltimore, 
The Johns Hopkins Press, 1922, pp. vii -|- 118, 

*Hoke, Roy Edward, Ph.D,; “The Improvement of Speed and Accuracy in 
, Typewriting,” The Johns Hopkins University Studies iu Education, No. 7. 
Baltimore, The Johns Hopkins Press, 1022, pp, 42. 
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studies were used in determining the relative frequency with which 
the various letters and other characters on the keyboard are used. 
These data were used in connection with a study of errors in type¬ 
writing and close correlation between infrequency of use and frequency 
of error is revealed. Other causes of error were also studied. The 
relative abilities of the eight fingers and the two hands were studied 
with interesting results. Tinally the relative load assigned to each 
hand and to each finger was studied. The author concludes from the 
nature of the evidence that, inasmuch as the so-called standard or 
universal keyboards have been arranged with reference to no dis¬ 
coverable criteria whatsoever, greater speed and accuracy may be 
attained by a rearrangement of the keyboard based on the principles 
underlying the touch method, and by a redistribution of the loads 
assigned to the several fingers. 


6. On the Improvemeni of Examimtions .—Anent the matter of 
educational measurement and the unreliability of teachers^ marks 
let it be said that practical school people can learn much from the 
technique of standardized tests. This tliesis is admirably set forth 
in a recent bulletin.^ Following an analytical critique and a similarly 
analytical defense of such examinations, numerous practical sugges¬ 
tions for improvement are presented and illustrated. Directions 
are given for constructing and scoring true-false examinations, recog¬ 
nition exercises and completion tests. 


7, Biology and the Public Press.*—In the belief that the careful 
determination of the curriculum of secondary schools necessitates num¬ 
erous quantitative studies^ the joint authors of this monograph have 
proceeded with one such investigation in the field of biology. They 
have tried to ascertain the amount of biological information supplied 
to the public through the columns of a representative selection of 
American newspapers. In some 14,000 newspaper' pages over 3000 
articles were found to deal with some phase of biological material, 
About three-tenths of the 25,000 running inches of biological matter 
pertained to health. Articles on animals rank next in frequency. 

‘Monroe, Walter S.; "Written Examumtions and Their Improvement.’’ 
University of Illinois Bulletin No. 9. TJrbana, University of lUinois, 1922, pp. 71. 

* Finley, Charles W. and Caldwell, Otis W.: "Biology and the Public Press." 
Now York, The Lincoln School of Teachers College, 1023, pp. 161. 
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Tliese, with articles ou plants and food represent over 90 per cent of 
the total space. From the nature of the evidence the public has out¬ 
grown fictitious biology but evolution seems either to have lost its 
news value or has perhaps been deleted in consideration of those who 
hold differing opinions, fictitious or otherwise. It is surprising to 
find in the press so slight a refiection of the organized attempts to 
prohibit the teaching of evolution. 

Over a hundred typical clippings are reprinted in this publication 
and the authors suggest that teachers may find these, or similar articles 
useful as points of contact or departure in instruction. 


Tests and Phacttcb Materials 

1, Pintner-Cunningham Primary Mental TesU —This is a non-verbal 
group test for use in the classification of kindergarten and primary 
pupils. The teat consists of a 16-page booklet of pictures to be marked 
by the pupil according to standardized oral directions. The coefficient 
of correlation between two trials of the test was 0.88 with one group 
and 0.93 with another. The probable error of the score has been 
found to be two points. The correlations with other mental tests 
and criteria are not so high. By means of a table based on 856 cases, 
mental ago and IQ may be derived. “Scale Charts ” of the “Percentile 
Graph.” are pcovLded, and their use fe dUcuesed in the manual of 
directions. Grade norms for mid-year are also given. The total 
time necessary for giving the test is not mentioned in the manual. 
The cost per pupil is approximately 6 cents. 


2. Cole-Vincent Group Intelligence Test for School EnirantsJ —This 
is a non-vevb&l test for 6-, 6- and 7-yeav-olds. The standardization is 
still in process but a recent bulletin based on 751 cases shows the test 
to be exceedingly discriminative and reliable over its whole range. 
This cannot be said of some of the primary group teats with which it 
has been compared. In another report from the field this teat showed 
higher correlations with Binet mental ago scores than those obtained 
when the Detroit or Dearborn tests were similarly compared with the 
Binet. 

^Pintner, Rudolf and Cunninghom, Besa V.; “Pintner-Cunningham Primary 
Mental Test." The World Book Co., Yonkers on Hudson, 1923. 

^ Cole, L. W. and "Vincent: “Cole-Vincent Group Intelligence Test for School 
Entrants.'* Bureau of Meaeureiuents, State Normal School, Emporia, Kansas. 
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INTELLIGENCE EXAMINATION DELTA 2 

M. E. HAGGERTY 
TJnivorsity-of Minnesota 

The primary purpose of this paper is to present a revised table of 
age norma for the Haggerty Intelligence Examination, Delta 2, for 
which there has been a constant and persistent demand since the 
initial publication of the test. Advantage will be taken of the occa¬ 
sion to present certain pertinent facts derived from the use of the 
examination during the two years since it first became available for 
general use. 

The age norms by years and months we given in Table I. A 
mental growth curve based on these norms is shown in Figure 1. 
These norma ate baaed on the results of the examination of more than 
40,000 individuals ranging from Grade III of the elementary school to 
the second year of college. The norms are not exact medians for any 
particular group nor for all combined. There is increasing evidence 
that groups of exactly the same median chronological age will differ 
as much as a full year in mental age, as measured by examinations of 
the Delta 2 typo. Thus, 12-year-old pupils in the 1-teacher schools of 
New York State score 75, whereas pupils of the same chi’onological 
age in the larger rural schools of New York score 93, a difference of 
18 points. These two groups are fairly large, 446 and 656 individuals 
respectively and there is high presumption that these 12-year-olds 
are relatively unseleeted groups from their several communities. 
Within the same city system two Grades Vlll’a of approximately the 
same chronological ages, may differ as much as 2 full years in mental 
development, in terms of this test. Any age norm, therefore, based 
upon any particular group of individuals, will be inaccurate, the 
amount of inaccuracy depending upon the degree of selection repre¬ 
sented. Until someone devises a testing program which obviates the 

257A 





258 The Journal of Educational Psychology 


Tadlb I.—Haggerty Inteujgbmch Examination, Dei/ta 2* 


Ages 

in 

years 

1 

Months 

■ 


3 

4 

6 

6 

7 

8 

9 

10 

H 

7 

7 

8 

9 

10 

11 

12 

13 

16 

16 

17 

18 

19 

8 

20 

22 

24 

26 

27 

29 

31 

33 

35 

37 

38 

40 

9 

42 

43 

46 

46 

47 

49 

60 

61 

63 

64 

66 

67 


58 

69 

60 

61 

62 

03 

64 

65 

66 

67 

68 

60 

11 

70 

71 

72 

73 

74 

76 

76 

77 

78 

79 

80 

81 

12 

82 

83 

84 

86 

86 

87 

88 

Bl 

90 


92 

93 

1& 

94 

95 

90 

97 

98 

99 


■ntil 

101 

102 

103 

104 

14 

106 

106 

107 

108 

100 

ilo 

IBI 

111 

112 

113 

114 

116 

16 

IIQ 

117 

118 

118 

119 

120 

121 

121 

122 

123 

124 

124 

16 

126 

126 

126 

127 

127 

128 

128 

129 

130 

130 

131 

131 

17 

132 

182 

133 

133 

134 

134 

136 

136 

136 



137 

18 

137 

137 

138 

138 

138 

130 

136 

139 

140 

■tni 

140 

141 

19 

141 

141 

142 

142 

142 

142 

143 

143 

143 

143 

144 

144 

20 

144 







' 






* Age norms for individuals of ages 7 to 20 years—baaed on about 40,000 oases. 
Figures in first column opposite years indicate normal scores for individuals of 
even ages. Figures in succeeding columns to right indicate normal scores for 
months beyond even oges. 


factor of selection the most satisfactory age norma will be obtained by 
inference and construction, based upon the differential data, from 
multiple groups. No statistical procedure as yet proposed avoids the 
necessity of some personal judgment in fixing age norms. 

The function of such age norms is to serve as points of reference 
for the scores made by children to be examined after the norms arc 
fixed. It is not necessary, for practical purposes, that such points 
of reference represent with absolute exactness the median quality of a 
perfectly unseleoted group of individuals, That the norms shall 
approximate such scores within the range of the probable error theoret¬ 
ically true foi’ a genuinely unselected group of persons of each chrono¬ 
logical age considered, and that they remain constant, is all that the 
practical uses of a table of norms demands. 

How well the norms of Table I meet this practical criterion will be 
evident in Table II which gives the intelligence quotients—figured 
in the usual way—for approximately 1000 children whose scores were 
not considered in the 40,000 cases from which Table I was constructed. 





















Intelligence Examination Della 2 


259 


The median intelligence quotient here is found to be 98.3. It was the 
belief of Dr. Miller who made the teats and accumulated a large amount 
of information from Achievement and other intelligence teats on this 



Fta. 1.—IntelliEence Examination, Dolta 2. Mental growth curve, iriguros'on loft 
ordinate indioato sooie, Figuree on base lino indioato ohronologloai agC.J 


1000 pupils, that the median IQ should bo slightly less than 100 which 
would be the theoretically correct score for a perfectly normal group. 
While most of the oases were of the type to be found in the average 
American community, there was a considerable sprinkling of foreign 
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children in the group, many of whom had language difficulties which 
probably made for lower scores in the Delta 2 test. 


Table IL-—DiBiniDUTioN of Intblliqbnob Quotients, Austin Public Sciiools 


IQ 

Elementary j 

Junior high 

Total 

3B 

3A 

4B 

4A 

5B 

5A 

6B 

6A 1 

7B 

'71 

8B 

81 

g 

46- 60 

2 













2 

61- 66 















66- 60 

« , . 


1 



. , 


•.. 

.., 

... 



.,, 

1 

61- 86 

2 

, , 

1 

1 

, , 


« . . 

2 

1 


. >. 


• .. 

7 

66- 70 

2 

2 

1 

2 

3 

1 

1 

1 

2 

1 

1 


» 1 , 

17 

71- 76 

6 

6 

9 

10 

4 

, . 

6 

4 

1 

.. 

. >. 


2 

40 

76- 80 

8 

2 

m 

8 

6 

4 

2 

5 

4 

2 

4 


2 

66 

81- 86 

8 

6 

12 

10 

11 

7 

2 


6 

4 

3 

8 

e 

88 

86- 90 

12 

11 

14 

4 

10 

6 

4 


4 

■Til 

3 

5 

16 

108 

91- 96 

14 

3 

19 

6 

11 

10 

7 

9 

8 

5 

3 

8 

16 

118 

96-100 

18 

8 

Id 

6 

7 

8 

8 

5 

10 

6 

11 

3 

12 

118 

101-106 

12 



6 

K1 

7 

7 

6 

13 

7 

12 


B 

118 

106-110 

8 



4 

6 

6 

7 

5 

15 

6 

7 


B 

90 

111-116 

2 



3 

6 

4 

6 

4 

5 

3 

10 


m 

63 

116-120 

3 

1 

6 

1 

7 

2 

8 

6 

4 

1 

8 


m 

66 

121-126 

4 

■ 

2 

1 

2 

2 

6 

1 

4 

6 

2 


11 

43 

126-130 

1 




2 

, * 

6 

2 

4 

■ 


1 

8 

23 

131-136 

» • . 



, * 

, , 

1 


3 

4 

H 


■ 


14 

136-140 


■ 

■ 



1 




H 


■ 


13 

141-145 

3 

H 

1 


•• 

1 


H 


■ 

I 

■ 


18 

Totals.... 

104 

61 

116 

62 

82 

00 

66 

69 

84 

60 

74 

-43 

133 

908 

Medians. 

ge 

90 

93 

86 

66 

98 

■ 

06 

104 

69 

106 


■ 

08.3 


On page 261 are given the distributions for approximately 3600 
pupils, in terms of their cbronologioal ages, and of their mental ages 
as measured by the Delta 2 test. These children are elementary 
pupils found in the larger rural schools of New York state.^ Histo¬ 
grams showing the distribution of the entire group in mental and 
chronological ages are given as Figures 2 and 3. 

* New York “rural” schools include towns of 4600 population and less. The 
3000 oases aie from sohoola having four and more teachers. 

























































Fig. 2.—Intelligence ExamlnationSj Delta 2. Foup-loachcT elementary schools. 
Qrades 111 to V. Distribution by ages in terms of chronological andmentnl ages. Solid 
line reprosents chronoiogioal ago, broken ii’no, monial age. 3076 coses. 
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Table III.—Intelliqbncb Examination, Delta 2 


Years 

Grado 

III 

Grade 

IV 

Grade 

V 

Grade 

VI 

n 

Grade 

VIII 

CA 

MA 

CA 

MA 

CA 

MA 

CA 

MA 

CA 




7 

21 

42 

1 

31 


7 







8 

142 

190 

13 

142 

1 

59 


16 


3 



6 

RE 

104 

236 

Kigl 

27 

105 

• » « • 

36 


8 


2 

10 

64 

46 

195 

164 

IS 

110 

29 

74 

6 

23 


3 

11 

24 

20 

146 

90 

m 

130 

190 

108 

24 

67 


16 

12 

14 

g 


70 

122 

110 

233 

162 

m 

88 

43 

61 

13 

6 

2 

47 

29 

78 

78 

136 

127 

IB 

113 

138 

04 

14 



17 

6 

48 

42 

86 

inn 

m 

126 

mm 

126 

15 

1 


4 

2 

17 

12 

32 

46 

70 

77 

112 

07 

16 



1 

• i • 

1 

6 

8 

23 

12 

50 

47 

70 

17 






4 

1 

11 

1 

21 

14 

38 

18 








5 


9 

4 

30 

19 










6 


15 

20 










6 


25 














Totals. 

412 

412 

728 

728 

669 

669 

713 

713 

687 

687 

660 

666 

Medians. 

9,2 

S.8 

10.6 

0.9 

11.8 

11.3 

12.5 

12.7 

13.6 

14 

14,4 

14.8 

Average de- 




■ 


■1 


M 





viation.... 

,0 


.9 

m 


B 

.8 




.8 

1,7 


1 Four-teaoher elementary schools. Grades III-VIII. Age-grnde distribution 
in terms of chronological and mental ages. Medians and average deviations given 
for both chronological and mental ages in each grade. 


Some CoNSIDBRATIo^rs Employed in Determining the Norm 

For the determination of the norms, as given in Table I, the 
writer had available the results from several large groups of public 
school children. The major ones employed were as follows: 6184 
rural white pupils in 1-, 2-, 3- and 4-teacher schools, in the state of 
Virginia; 3641 city white pupils, in the state of Virginia; 2323 white 
pupils from the cities of Aberdeen, Baltimore, Cleveland, Evansville, 
Indianapolis, Louisville, Rochester, and Santa Anna; 3755 pupils 
in 4-teacher rural schools, in the state of New York; 3423 pupils from 
1-, 2 and 3-feacher rural schools, in the state of New York. All 
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of these groups were distributed in about the normal proportion from 
Grades III-VII (Virginia) and Grades Ill-Vni, There were also 
a-vailable about 12,000 cases from elementary schools, furnished the 
writer by the publisher from returns made by purchasers of the test.^ 
In addition there was considerable data reported directly by users of 
the test to the writer. Thus, there were 1000 Grade VIII cases from 
tests given at one promotion period to all pupils finishing the eighth 
grade in the city of Minneapolis, similar material from the city of St. 
Paul, and from a large number of smaller cities throughout the country. 
For high school students there were results from 1800 children, in 
Grades IX-XII inclusive, from the New York survey. Similar 
data, for approximately 1000 Grade IX children from the Virginia 
survey, and a large amount of similar data, was furnished the writer 
by users of the test in Wisconsin, Minnesota, Colorado, California 
and elsewhere. This material came from both large and small high 
schools. 

In all of the foregoing data there were available not only medians, 
but complete distributions. Median scores for grades and ages have 
also been furnished in considerable numbers from elementary school, 
high school, and college and normal school groups. 

Methoos Used in Determininq Norms 

The basic method used in determining norms was to construct 
distribution tables for each of the several groups in terms of chrono- 
logica ages. The median scores for the several age groups were then 
compared and a tentative table of age norms based upon these medians 
was constructed. The further work consisted in adjusting these age 
norms in the light of further considerations. 

One method of making such adjustment was to select from a large 
group of 3000 or more students those pupils who were of normal 
chronological ages for the grades in which they were found. Median 
age scores for one such group are to be found in Table IV. A compari¬ 
son of the tentative table showed that the medians for any age group 
were not indentical with the medians for children of the proper chrono¬ 
logical ages for the grades. The second method of readjustment was 
to study the progress which typical groups showed from one chrono¬ 
logical age to the next and a similar type of comparison for the progress 

‘ These data wore less valuable than desired, owing to the inability to estimate 
in any form tlie amount of selection represented in most of the returns. 
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covering 2-yeai’ intervals (see Table V). A third method was the 
construction of a mental-age-grade distribution of the type shown in 


Table IV.— Delta 2 Median Scoubs for Pnriia of Noukal CrnioNOLOoiOAij 
Ages for the Grades in Which They are Found 


Grade. 3 4 6 6 7 8 0 10 11 12 

Chronological age. 9 10 11 12 13 14 16 16 17 18 

Score. 43 68 75 92 99 118 123 132 137 143 


Table V. — Intelligence Examination, Dei/ta 2. Interagb Steps Stated 
IN Terms of Test Scoreb 


TVo-yenr Age Intervals 


Sued on. 


8-10 

^01 

10-12 

U-13 


13-16 

14-10 

is-n] 

16-18 

1 

s 

IS-20 

Ago soTBifi SB givon Id 













MoDUftl of LHiaotioDa.., 


30 

23 

22 

21 

28 

2S 






New ( 1-toaoheraahoole .. 


18 1 

33 

27 

13 

11 

Z 






York \ ^'teaohersahools ., 


27 1 

21 

26 

20 

22 

23 


17 

0 



Tabulatione of 12,000 coeea 


24 

10 

18 

40 

30 

2 


17 

2 



New norms from Teblo I,. 

as 

38 

28 

24 

1 

24 

23 

22 

20 

10 

12 

H 

7 


Table VI.— Intilltqbncb Examination, Delta 2i 


Grades. 



B 







XII 

Score. 

39 

67 

76 



1 116 ' 

126 

136 

136 

141 

Median age. 

9-2 

foS 

11-7 

12-0 

13-6' 

14^5 

16-1 

|[rS1 

17-1 

17-11 

Corresponding age score... 

46 

64 

77 1 

88 



117 

127 

132' 

137 


‘ Four or more teacher schools of New York. Grades III—XII Median sooro 
and median age with score oorresponding to that age, for caoh grade. 


Tab e III. Still another method is represented in Table VI. This 
table gives the median scores and median ages for a group of New York 
4-teacher elementary schools and 4- and more-teacher highschools. In 
the third horizontal row of this is given the intelligence score called for 
by the median ages of the pupils. Thus, in Grade V the median score 
for the group is 75. The median age is 11-7 and the corresponding age 
score called for by this median age is 77. In this case the actual 
median score and the indicated score are not identical, nor are they 
identical in any one of the gi’ade groups. The difference in the case of 
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Grade IX is 8 points which ia almost aa great as the difference between 
the actual median scores of Grades VIII and IX. These differences 
represent in a degree not calculable from our data, the amount of selec¬ 
tion taking place in the several grade groups. 

By successive readjustment of the figures of the first tentative 
table in the light of considerations of the typo represented by these 
several forms in which the data were developed, the final norma as 
represented in Table I were finally fixed upon. 

The norms for the extreme ages reprcacnted in Table I call for a 
further word of explanation. The Delta 2 Examination is probably 
not a very satisfactory measure of intelligence below the age of nino 
years. Norma for seven yeara and eight years are given, not as 
indicating that the test should in general be used for these lower ages, 
but in order to give some rolative value to the low scores which are made 
by children of nine and higher chronological ages. While the degree of 
exactness characteristic of the other portions of the table are absent 
from these lowest age norms, they will serve a genuine purpose in 
Indicating something regarding the mental quality of older children 
who make these lower scores in the test. 

While it is customary in stating age norms for group intelligence 
tests to assume a negative acceleration in the mental growth curve 
for ages beyond 14, and a cessation of mental development at 16 years, 
the results of the Delta 2 Examination indicate the desirability of 
stating age norms for chronological ages beyond sixteen. Almost 
without exception the reports from high schools indicate an increase of 
intelligence scores with each Bucceesive school grade (see Table lY). 
Reports from colleges and normal schools indicate a still higher level 
in many if not all oases. One obvious explanation for this increase of 
scores is the operation of the selective function of the school program 
which eliminates, year by year, the less intelligent pupils from the 
schools. It has not, however, been proved that those children who 
remain throughout the high school do not actually improve in their 
ability to make scores on group intelligence tests of the type of Delta 2. 
There is at least a fair assumption that just this increase of ability 
does take place, and that the actual increase of scores found from 
Grade IX to Grades XI, XII and XIII is only in part due to the factor 
of selection and in part due to actual mental growth. The construc¬ 
tion of age norms, therefor©, from 16 to 20 will serve a useful purpose in 
giving the means by which these children of the higher ages may bo 
rated for relative intelligence. 
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If it should ultimately be shown that the whole increase in median 
scores from grade to grade, and age to age, is due to 1 ho operation of 
selection, then the age norms would offend only by giving these older 
and more intelligent pupils a relatively lower rating in. terms of 
intelligence quotients than they should actually have. If on the other 
hand it should ultimately be shown that tho increased score is due to 
actual mental growth, then there is no offense except that of inaccurately 
measuring the amount of this growth. If, as the writer suspects, 
the truth lies between these extremes these upper age norms will 
still provide a more accurate mee^ure of intelligence than if the growth 
curve should become horizontal at 16 years. 

Facts Derived erom the Use op the Delta 2 Examination 

It may be worth while in connection with the publication of the 
new age norms to add certain data bearing upon the usefulness of 
the test as a measure of intelligence and as a basis for predicting 
school success. 

The crucial question is this; Do children who score highest achieve 
in school work the same relative standing that they do in the intelli¬ 
gence examination? Doubtless tho most accurate method of deter¬ 
mining this relationship is by calculating coefficients of correlation 
between the scores of the test and other measures of ability and success. 
Such coefficients will be given in considerable numbers. First, 
however, we may use the simpler method of decile comparison. 

Decile Groupb 

The data represented in Table VII and in Figure 4 are derived 
from tests on 200 unselected Grade VIII pupils. The figures in the 
first column of the table number the successive deciles based on the 
Delta 2 test; the figures of the second column are the median Delta 
2 scores for the several decile groups; and the figures of column 3, 
are the summated scores for the several decile groups in the following 
tests: Silent Reading (Haggerty Sigma 3, Form B); Spelling (Ayres- 
Breed); Addition and Multiplication (Woody); History, Information 
and Thought (Van Wagenen); and Arithmetical Problems (Delta 2, 
Exercise 2). The crossed bar in Figure 4 represents the decile intelli¬ 
gence; the black bar shows the decile achievement. There is evident 
here a regular increase in achievement comparable to the increase 
of intelligence scores. 
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TABiiB VII. —Medians in Total Scoiuss oe Aciiibvbmbnt Tests pon Each 
Decile Group in Intbluqbncb Ekamination Delta 2^ 


Decilo groups 


Delta 2 


Total achiovement 



^ Two hundred oases being all Grade VIII pupils tested with all tests in ISrie 
County, New York. 

0 ao 60"' 60 100 iZQ 140 160 

fttfomllki I I I I n I I I 
' Group 
10 



140 160 


Inlislll^eDCC;^^ /VfAevement 

Era. 4.—Comparison for each docile group in Intelligenoe Examination, Dolta 2, be¬ 
tween median total achlevomont Booiea and median. Delta 2 ecoroa. Two hundred 
oases being all Grade VIII pupils tested with all tests in Erie County, New York. 
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Coefficients of Cohuela.tion 

The correlation method of evaluation is represented in Figure 6 
which shows the relation existing between the scores of the Delta 2 
test and the criterion scores for 232 12-year-olds in the schools of 



2 S S S § S § S ^ S S ^ R 8 2^ Ilf 


«•! W 

Westchester County, New York. The criterion score in this case was 
composed of three items: The grade location of the pupil, his teacher’s 
rating for scholarship and the scores which he made in the Haggerty 
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Reading Examination, Sigma 3. In combining these items, the grade 
location was multiplied by 10 and the teacher’s ratings for scholarship, 
given in Figures 6, 4, 3, 2, and 1 were equated to the following 
numbers 9, 7, 5, 3, and 1. The raw scores in the reading examination 
were used. The maximum score possible from this combination was 
238 points. The actual maximum was 205 and the median score for 
the group was 130. 

The relation represented in Figure 5 will be clear from the following 
description. 

"The numbers along the base line represent the criterion score. 
The heavy horizontal lino across the middle of the figure indicates the 
median score (96) in the intelligence examination. Delta 2. The hori¬ 
zontal line next above (+IQ) is placd at a distance from the median, 
which is equivalent to the semi-interquartile range (Q) of the scores 
in the Delta 2 examination. The second horizontal line (+2Q) above 
the median is placed at twice the distance of the semi-interquartile 
range above the median. Similarly, +SQ represents 3 times this 
measure of variation. In like manner the horizontal lines IQ, —2Q, 
and —3Q represent corresponding distances below the median. i 

The heavy vortical line (M) represents the median criterion score 
(130). The lines H-IQ, -h2Q, and -j-SQ represent distances above 
the median, of the criterion score equivalent to 1, 2, and 3 times the 
semi-quartile range (Q) of the criterion scores of the 232 children. The 
vertical lines — IQ, — 2Q, and — 3Q represent similar distances below 
the same median.” 

The dots in the figure represent individual children whose criterion 
score may he obtained by locating on the base line the vertical for 
each dot and whose test score is shown on the ordinate at the left. 

"All of the dots inclosed within the two diagonal lines represent 
children who do not differ in their rdative standing in one test from 
their relative standing in the other test by an amount greater than the 
semi-interquartile range in either test. The children represented by 
the dots outside the diagonal lines represent cases which do differ 
in one test from the median score in that test by an amount relatively 
larger than the variation which they achieve in the other test. To put 
it in another way: The dots within the diagonal lines represent 
children who are grouped in approximately the same manner by the 
two measures used. The dots outside the diagonal lines show children 
who are given different relative standings by the two measures. The 
fact that relatively few dots are found outside the diagonal lines 
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indicates that the scores in the two measures give approximately the 
same kind of classification.” 

The coefficient of corrdatdon (Pearson Product Moment Method) 
for the data shown in Figure 3 is 0.86 +0.0118. 

To supplement the two illustrations just given refei'encs may bo 
made to published studies. The first critical study of the Delta 2 
Examination published by any person other than the writer was that 
by Holley^ who used with the same pupil groups in the public schools of 
Champagne the following tests: (1) Otis Group Intelligence Scale; 
(2) Theisen-Floming Classification Test, Form A; (3) Whipple’s 
Group Test for Grammar Grades; (4) Pressey Primer Scale; (5) 
Haggerty Intelligence Examination, Delta 2, and (6) Holley Sentence 
Vocabulary Scales. 

The average coefficient of correlation of all the test results and the 
teachers’ rating for scholarship was +0.462. The coefficients for the 
Delta 2 were as follows: 


Grade III. 

.57 ± 

.06 

IV. 

.45 ± 

.06 

V. 

.. .56 + 

.04 

VI. 

.69 ± 

.03 

VII. 

.71 + 

.04 

VIII. 

.68 ± 

.05 


The average coefficient is 0.592. None of the other tests given yielded 
such uniformly high coefficients. 

A valuable method by which to determine the value of one intelli¬ 
gence test is to check it against other intelligence tests of known or 
assumed validity. Such a study was made by Stenquist® who reports 
the results of an extended investigation on the validity of group 
intelligence examination of which the Delta 2 was one. In this study 
a criterion composed of the sum of all the test results was used as a 
measure of each test. The coefficients of correlation for each test 
with this criterion are as follows: 

‘Holley, ChorlGS E.: Montsd Teats for School Use, Univei'sity of Illinois 
Bulletin, Vol. 17, No. 28, 1920. 

*Stenqui8t, John L.: Unreliability of Inchvidual Scores in Mental Measure* 
mente. Journal of Educational Research, Vol. IV, p. 347 ff. 
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National A. 

National B. 

Otis (Advanced). 

Haggerty, Delta 2. 

Visual Vooabulary. 

Kellfiy-Trabue. 

Meyers Mental Measure.. 
Woody-MoCall Arithmetic 


r = 0.801 (n ^ 560) 
r ^ 0.788 (to - 518) 
r ^ 0.680 (to = 551) 
r ^ 0.808 (to = 532) 
r =* 0.680 (to = 461) 
r =0.58 (to = 681) 
r ^ 0.48 (to = 644) 
f ^0.39 (to = 298) 


The coefficient for the Delta 2 is as high as that for any test, 
slightly higher than for some and very much higher than for others. 
Stenquist also reports correlations for Delta 2 with other group tests 
as follows: 

National Scale A, 600 cases in Grades IV to VIII... r ~ .81 ± .01 


Otis Advanced. r = .69 ± .02 

National Scale B, 60 cases... r = .69 ± .04 


A similar study is reported by Franzen^ who used also the method of 
partial oorrelation in an attempt to evaluate each of 14 group intelli¬ 
gence examinations. All of the teats were given to the same group of 
67 fiist-year high school pupils. Each of the tests was checked against 
a criterion composed of the sum of the scores in all the tests. The 
results were presented in tables showing among other things the 
correlation of each test with the total, the correlations of each test 
with the thirteen others, and the inter-correlations of all tests, with 
reading ability (Thorndike Alpha 2) rendered constant. From these 
data Franzen draws his conclusions as to the value of the several tests. 
Table VIII gives the chief findings of Franzen's study. The Delta 2 
he includes along with the Otis and the National Tests, all of which 
"give a fairly good account of themselves” in nil of the tables. 


^Franzen, Raymond: Unpublished manuscript. 
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Table VIII 


■ 

Average of 
correlations 
of each test 
with the other 
13 tests 

Correlation 
of each test 
with total 
score of 13 
testa 

Correlation of 
each test with 
total score of 

13 tests with 
reading (Thorn¬ 
dike Alpha 2) 
rendered 
constant 

1. Terman A. 

1 .76 


.86 

2. National A. 

1 

.93 

.84 

3. Haggerty. 

.73 1 

.91 

.78 

4. Illinois General. 

.72 


.71 

6. Otis. 

1 1 


.78 

6. Mentimeter. 

.66 

.81 

.87 

7. Survey. 

.65 

.87 

.80 

8, National B.' 



.78 

0, Thorndike Rending. 


.81 


10. Dearborn 1. 

.68 

.74 

.81 

11. Presaey Cross-outa. 

.56 

.70 

.61 

12. Dearborn 2. 

.65 

.72 

.75 

13. Wylie. 


.04 

.24 

14. Myera. 

.46 

.63 

.77 


Gates^ reporting a recent study on the relation of achievement 
in school subjects to the scores in intelligence tests, cites the correlation 
of each of 14 intelligence tests with each of the others, and the corre¬ 
lation of each test with a composite measure of achievement in school 
subjects. He finds only two tests with a higher mean inter-exami- 
nation-oorrelation than the Delta 2 (see Table IX). The advantage 
of one of these which requires a third more time in giving is only 0.01 
and of the other which requires more than double the amount of time 
is but 0.05. Only two of the group examinations requiring as small 
an amount of time (National A and National B) show as high corre¬ 
lations with the composite of achievement. It shows practically 
the same correlation to the composite of achievement (0.62) as does 
the Stanford Revision of the Binet Scale and no group examination 

• Gates, Arthur I.: The Correlations of Achievement in School Subjects with 
Intelligence Tests and Other Variables. Journal of Educational Psychology, Vol. 
XIII, p. 223. 
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employing so small a measure of time showed quite so high a correlation 
with the Standai’d Mental Age. 

Important figures collected from several of Gates tables here 
follow: 


Table IX 



1 

Time 

(minutes) 

m 

3 

Mean r 
with 
achieve¬ 
ment 

4 

Meanr 

with 

13 teats 

Dearborn total. 

80 

.58 

i ,47 

.44 

Otis advanced. 

47 



.63 

Dearborn 6. 

46 

.49 

.43 

.43 

Dearborn 4. 

36 

,62 

.38 

.41 

National total. 

33 

.61 

.63 

,50 

Tkorndike-McCall. 

30 

.67 

.48 

.46 

Terman Groups. 

27 


.66 

.49 

HAGGERTY, DELTA 2. 

21 

.48 

.62 

.48 

National A. 

17 

.47 

.66 

.48 

Illinois. 

17 

.46 

.48 

.48 

National B. 

10 

,45 

.66 

,47 

Myers... 

■a 

.28 

,12 

.21 

Holley. 

H 

.42 

.43 

.37 


Miller^ reports the results of correlating the scores in several 
intelligence tests >vith each other and with the school marks of 55 grade 
IX pupils. The relation which the Delta 2 bears to these several 
measures may be seen in the following table; 


‘ Manual of Directions, Miller Mental Ability Test, p. 21, 
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ConRBLATIONS (PbARSON) OP MiLLBR TeBT WITH OtHBR TbSTS AND WITH ScHOOL 

Mahkb—65 Ghade IX PuFii/S, Univebsitt of Minnesota High Schooii 



Delta 2 

Terman 

Form A 

Alpha 

Form 8 

S 

1 

1 

If 

Average five 
tests 

^1 

o . 
9 M 

-S d. 

CO 

Otia test 

Miller. 

.784 

.747 

.76 




.663 

.734 

Delta 2. 


.817 

.778 

.685 


,884 

.503 

,716 




.823 

.714 

.931 

929 


.741 





.712 

.842 

KM 


.716 







.842 


.664 







.976 










KM 

.841 











All correlationg are positive. 


On the basis of tests given by Diclson in the Oakland Schools, 
the Delta 2 shows a coefficient of 0.65 ± 0.039 with the Army Alpha. 
From the same data the coefficient of correlation with the Stanford 
Revision of the Binet ^cale has been found to be 0.84 ± 0.018. Simi¬ 
lar figures have been furnished the writer by Superintendent Bliss, 
Dr. Elizabeth Woods, and others. 

The writer has calculated coefficients of correlation using the scores 
from the Delta 2 and the Haggerty Reading Examination, Sigma 3, 
Form B and the Miller Mental Ability test on 442 Grade IX pupils. 
The results are as follows: 
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Table X.— Cobppicibntb op ConnEtATioK Based on 442 CABsa of Grade IX 
Pdpil3 in Large High Schools, Ihvowing ImELLiaBNCB ExAMiuA-fioNB 
Delta 2, Heading Exahimawoh Sigwa 3, atjd Millbh Mental Abimdy 

Test 


1 


Delta 2 

Sigma 3 ' 


Miller 

Delta 2,. 

r » 

H 


.02 

,61 

rE« 


±.021 

±.006 

±.021 


r ^ 

WM 


,86 

.79 


TE = 

i.021 , 


B 

±,012 


B 

.02 

.86 

■ 

M 

4 UUU OI^UJ^ iJ 1 .. < . 

PE- 

±.006 ' 

±.009 

B 

B 

Miller... .... 

B 

.61 

.79 

,65 


PE = 

±.021 

±.012 

±.025 



The relation of the Delta 2 test to the Van Wagenen liistory 
scales may be observed in Table XI which gives the coefficients of 
correlation based on the scores of 152 Grade YIII pupils. 
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Table XL-^Coefficients op Gouhelation. Intelligence Examination 
Delta 2, Reading Examination Sigma 3, and Van Waqenen History 
Tests. One Hdndned and Fjpty-two OASEa in Grade VIII 




Intelligence 
examination 
Delta 2 

Reading 
examination 
Sigma 3 

Combined 

score 

of Delta 2 
and Sigma 3 


B 

.45 

,60 

.54 


PE = 

±.043 

±.041 

±,04 

History thought. 

Bl 

.71 

,78 

.69 

PE = 

±.028 

±,024 

±.03 

Combined history teats. 

m 

.63 

.03 

.70 

PE = 

±.033 

±.036 

±.024 


Coeffioient: History information and History thought = 0.60 i 0.035. 


The actual significance to be attached to any or all of the coeffi¬ 
cients of correlation printed in the foregoing tables cannot be accu¬ 
rately stated. They are calculated on groups which vary gi'eatly in 
character and ai’o subject to various modifying influences which may 
unduly raise or lower the theoretically correct figures. The constancy, 
however, with which different investigations report significant coeffi¬ 
cients is fairly conclusive evidence that the Delta 2 Examination has 
high rank among tests of this type. It is hoped that the new table of 
age norms printed herewith will measurably increase the usefulness 
of the test by enabling a more accurate determination of mental 
ages and intelligence quotients. 





















AN ANALYSIS 0^ THE EUROBS IN MENTAL 
MEASUREMENT 

KARL J. HOLZINGEB 
Uaivorsity of Chicago 

In the early development of tests and scales we were content to 
apply a good deal of biometric statistical method in the construc¬ 
tion of the teats themselves and in their applications to educational 
problems. This biometric calculus, however, was originally developed 
for the purpose of deacribing quite different types of traits from those 
studied in psychology and education. The description of a group of 
human skulls is a veiy different matter from the description of certain 
obscure reactions which go on within these same skulls, and for this 
reason some workers in mental measurement have come to question 
the strict applicability of many of the conventional statistical methods 
to their data. 

Spearman^ recognized that in measuring mental traits certain 
errors arise which do not need to be considered in describing fixed 
physical objects. These errors in “faulty data” are chiefly due to the 
fact that we are measuring variable traits by an indirect procedure. 
Both the variability of the traits from moment to. moment and the 
indirectness of the determination lead to types of error which must be 
recognized and studied with care. In the lost two or three years the 
attention of several American -workers in tests has been turned to these 
problems. Monroe, Kelley, Otis and others have set forth certain 
formul® which attempt to take into account the degree of reservation 
with which we may regard the accuracy of test results. The purpose of 
this article is to analyze,the more important of these errors, to indicate 
tentative formulm for their study, and to point out their importance 
in the interpretation of mental measurements and in scale consti’uction. 
To cover this much ground in a short space will imply sketchiness, 
but it is hoped that if the analysis proves helpful the details may be 
filled in later. 

Two Kinds op Educational Scales or Tests 

In order to oairy out the analysis su^ested it is necessary to recog¬ 
nize two types of scales which are in use at the present time. The first 
kind may be characterized as the qualUy or 'product scale. Examples 
maybe found in the current composition scales and in the Ayres Hand- 

‘ Spearman, C.; The Proof and Measuremeat of Asaociation between Two 
ThlhgB. Amer. Jour, of Psy., Vol. XV, 1904. 
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writing Scale. A sample of the pupil’s work is obtained and this is 
matched by the teacher -with a scaled specimen of known merit. The 
score ivS that of the specimen which the sample most closely resembles. 
Now it is clear that several kinds of error may creep into an evaluation 
of this sort. The scale itself may not be accurately set up, the pupil 
may give an unrepresentative sample of his work for matching, and 
finally the teacher may rate the sample inaccurately. 

Tlie second type of test may be called & performance test. In this 
case the pupil responds directly to the test material which is set before 
him. His score is determined by his direct response to the questions 
or items which he is required to cover. Here, again the material 
may not be well graded, the pupil may not make a representative 
response, and different scorers may not agree in the numerical value 
to be assigned to a given performance. With objective tests this last 
type of error may be practically eliminated. The difference between 
product and performance scales from the point of view of errors lies 
chiefly in the fact that the scoring of tho lattej' may be made much 
more objective. 

Types op Ennon in Mental Measuhembnt 

It iS how possible to formulate the kinds of error which need to 
be studied in connection with both types of scales. We shall 
enumerate five with the understanding that the classification is neces¬ 
sarily crude at the present stage of development in mental 
measurement. 

1. Scale En'ors. —(a) In product scales these errors are due to imper¬ 
fections in the material arising from poor selection and graduation of 
the specimens. In matching a pupil sample with these specimens an > 
error will occur by assigning the incorrect scale value of the specimen. 
For a group of pupils rated by the same teachei’ such errors will tend 
to be constant i.e., the same for all pupils whose work is matched with 
a given specimen. They are difficult to study because true scale values 
are unknown, and are further obscured by the subjective procedure 
in rating pupil samples. 

(b) In performance tests the problem is again obscured by the 
direct response of the pupil to the test material. Tho selection, 
graduation, and arrangement of the items will affect the response made 
by the pupil. The writer constructed a reasoning test whicli on analy¬ 
sis was found to measure speed in handwriting to a very considerable 
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extent. It was also found that a repetition of tlie test did notproduce 
very consistent results. By changing the form of the response required 
and by lengthening the test these two defects were largely remedied, 
but a new factor of fatigue was inti’oduced. The test was improved 
in its validity and reliability, by which are meant the extent to which 
it measured, what it purported to measure, and consistency of response 
on repetition with the same pupils. Scale error with performance 
tests is thus so intiraately related to responBO that it is difficult to deter¬ 
mine how much of the variation with a repeated test is due to imper¬ 
fections in the material and how much to actual fluctuations in ability. 
In the present paper we shall not attempt to analyze this problem 
any further, but will classify the gross fluctuations on repetition as 
errors in response with the understanding that the chief contributing 
cause is usually immediate variation in ability. 

% Scoring Srror (y). —In measuring pupils’ abilities with tests, 
numerical values are assigned by the examiner to the samples to be 
matched with scale specimens or to the responses made on performance 
tests. Scoring procedures of both types of tests will lead to en-ors 
which rmay be distinguished from those aimady mentioned. In the 
case of the product scale, scoring error will in general be large on 
account of the subjectivity involved in estimating sample merit. In 
Br. Theisen’s reporti on the Trabue Beale a single compoBition, 0-S, 
was rated all the way from 2.8 to 9.0 by 16 teachers. For perform¬ 
ance tests, on the other hand, such errors will usually be small. It is 
possible to prepare test material and to formulate scoring directions 
so that competent exarainei's will score a given performance in the 
same way. 

3, Pesponse error (5), is due to the fact that pupils respond diffei-- 
eutly on successive trials with a test when short tim6 intervals separate 
the trials. These fluctuations may be attributed to effort, emotional 
status, concentration etc. They cannot be ascribed very well to any 
fundamental change in the ability in question. We are measuring 
mental traits which exhibit instantaneous variation and we can never 
be sure at what phase of this variation a g^ven performance occurs. 

The procedure is a good deal like tliat which would be involved 
in measuring the length of earthwom» during a series of expansions 
and contractions. We should not think of comparing the length of 

^ Theiaeu, W. W. i Improving Teaoheia' Intimates of Compoeition Samples 
with the Aid of the. Trabue, Nassau County, Scale. School and Society, Vol. VII, 
February, 1918, pp. 143-50. 
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two worma from aingle measurements under such varying conditions. 
If we wished to make a useful compavision we would probably make 
several determinations and strike an average for each worm. Now a 
human ability is more difficult to measure that the length of an earth¬ 
worm because we can never be sure whether a particular performance 
occurs’under ‘'expansion” or “contraction.” The best method we 
have at present is to make at least two determinations for each indi¬ 
vidual in a group and from the series of differences thus obtained set 
up a measure of the probable divergence of a given score from the 
theoretically true score. The factor of group practice effect may be 
eliminated as will be indicated below. 

4. Sampling Error («).—If we confine our statistical descriptions and 
inferences to a particular group this error does not occur, but when we 
wish to extend these inferences beyond the results of the particular 
sample, it must be taken into account. The sampling error of a 
statistical constant tells us the amount and probability of the variation 
wo may expect if the same constant is worked out from another sample. 
This index of reservation should be carefully distinguislied from all 
other types of “error,” yet it is not infrequently supposed to hava some¬ 
thing to do with the arithmetical accuracy of the computations. 
There is probably no conception in statistical method so commonly 
misunderstood. 

5. /Sporadic errors are those due to arithmetical blunders in scoring, 
misunderstanding of test directions, time lost by the pupil with a 
broken pencil, etc. Such errors may be eliminated. They do not 
lend themselves to mathematical treatment such as is possible in the 
case of scoring, response, and sampling error. 

Functional Relationships between Ability and Score 

Certain functional equations may now be set down expressing 
the relationships between ability and score with the types of error 
enumerated. If attention be confined to a particular group and no 
' errors considered, the expressdon may be written, 

(a) Ability = / (score) 

i.e., given a particular score, the ability is uniquely determined. This 
theory upon which much of our early statistical work was based is not 
tenable because errors do exist and may not be neglected. 

In the case of a product scale the relationship becomes 
(t) Ability = g (score, y, B) 
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if the existence of scoring and response error be admitted. For per¬ 
formance tests which are objectively scored the equation is 

(c) Ability = h (score, S) 

Finally, if inferences ai'e extended beyond the particular group 
measured, each of these three relationships will involve sampling 
error, t, so that the most general expression with which wo have to 
deal is 

(d) Ability = k (score, y, 8, e) 

Formula for Scoring Error with the Product Scale 

If several teachers rate a single sample with a product scale the 
best assumption that can bo made regarding these judgments is that 
tlioy are distributed according to the Guassian law. Actual distribu¬ 
tions of residuals check the assumption fairly well. Employing tho 
usual formula found in any work on least squares we have 

P.R., = .6746^^ (1) 

where X obtained rating, v ~ X — M — variation of such a rating 
from the mean t.e., residual, and n * number of judges. In the case 
of the 16 judgments of composition C-3, mentioned above, P.B.x^ - 
1.73 which means that it is an even chance that the true judgment will 
vary from the obtained by this amount. 

Response error with product scales is due to the differences in 
samples which pupils submit for rating. With constant scoring 
error a measure of variability in response may be obtained by working 
with the differences in successive samples fora given group. Formula 2 
(below) might then be applied. As a matter of fact scoring error will 
not be constant for the two sets of samples to be compared, and will so 
obscure the result that a mathematical formulation is extremely diffi¬ 
cult. It will not be attempted in this paper, 

FORMULiB FOR RESPONSE AND SAMPLING ErROR WITH PERFORMANCE 

Tests 

Let X\ and Xi denote scores on successive trials of a test with a 
given group, and Sa the respective response errors, and X‘- the true 
score by either trial so that Xi = -f 5i and Xi = + 82 , 

Setting d = Xi —X\ = 82 —$i we may obtain an expression for P.E. 
or P,E,x^ (to distinguish it fi’om the error in the mean) in terms of d 
which is the known difference between successive pairs of scores. 
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According to the law o£ error aa = or ■\/2 <^6 i£ the 

standard errors on both trials are equal and the individual errors imcor- 
related. We may then write 

.6746 

P . E . x ^ - fTd = .477ffd (2) 

It is evident that this function is independent of group practice effect, 
Mi — Ml, since ffj+e = o'* where c is a constant. Formula (2) is 
the same as that obtained by Monroe and others. It is equivalent 
to Merriman’s^ formula (39) with the assumption that the scores are 
of equal weight. 

The writer has attempted to check some of the assumptions 
involved in the above proof by means of empirical tests. The assump¬ 
tion that the error, 8 , is normally disfaibuted is checked roughly by 
the fact that distributions of d = ~ Xi resemble the Gaussian 

curve rather closely. Data will be pi*escnted in a later article. From 
the results of repeated tests in arithmetic and intelligence it was also 
possible to check the assumption = 0 by working out Six 

values for arithmetic material are as follows: 0.05, + 0.04, + 0.11, 
-h 0.13, + 0.15, 4* 0.32 {N = 62, Grade V), the last coefficient being 
significant. Brown and Thomson* find higher correlation with differ¬ 
ent test material, and on the strength of such evidence question the 
validity of Spearman's formulae for attenuation. Finally the assurhp- 
tion that the scores are of equal weight implies that they -will have 
equal probable errors. The correlation r*8 should therefore be zero 
within the ordinary limits. By actual test = —0.22, —0.26, 
— 0.27, —0.29 {N = 62, P.E,r = 0.08). It therefore appears that a 
small response error is associated in general with a high score and vice 
versa. This is contrary to the law for accidental errors of observation 
i.e.,P.E.i — P.E.u \/length, which implies that the larger errors are 
associated with larger linear magnitudes. On the whole the assump¬ 
tions involved in the proof of (2) appear* to be roughly justified but a 
good many careful tests with different types of material are needed. 

Formula (2) may be written in another form . Since d = Xz — Xi, 
~ 2 ri 2 {ri 5 cr*, -|- <r*,* or Vd = o‘i'\/2(l - Vu) if cr^, = (Tj,. 
Substituting in (2) gives 

P.E.jsj = 0.6745(r,Vi - T i2 (3) 

* Merriman, M.: " Method of Least Squares.” New York; John Wiley Sons, 
1015. 

* Brown and Thomson: ''Essentials of Mental Measurement.” England: 
Cambridge Press, 1921, pp. 168. 
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This formula is often more convenient tiian (2) because if the reliability 
coefficient is known, the required probable error may be readily 
obtained. The two formulfie serve as mutual checks. 

The interpretation of the probable error of response is that it is 
an even chance that the next score or the true score of an individual 
will differ from the obtained by the given amount. This interpreta¬ 
tion must be modified, however, in the light of the above experiment 
on correlation between score and error. If ft pupil makes a high score 
on a teat the probable divergence of a second score will be less than that 
predicted by formulae (2) or (3). 

As an example, for the Terman Group Intelligence Scale, 

Forms A and B, with 135 first-year high school pupils was 6.6 points. 
Using 3P.E. as a criterion for safe prediction this means that a pupil’s 
true score will lie within the range 20 points below to 20 points above 
the obtained value with practice effect eliminated, or that we are 
reasonably sure that his true score is within 20 points of the obtained. 

The probable error in the mean is often more useful than that'for 
an individual score. Assuming that the law of error again holds we 
may write 

P.E.«, = 


PM.Xt _ 0.6745g.Vl - rn 
VN y/N 


For the Terman Scale P.E.jtj « 0.6 points. With groups of this size 
(135) the response error in the mean is often relatively small, but not 
negligible. 

The formula for sampling error in the mean is, 


P.E.« = 


0.6745<r, 

y/N 


(5) 


If we assume sampling and response error both present but uncor¬ 
related, formulae (4) and (5) may be combined so as to give an expres- 
sion useful in testing differences. Since P.E,o+i = \/P.E.o^ + P.E.6* 
for iincorrelated errors a and h we have 

p-p 0.m&rsV^~ri2 f. 

It will be noted that when a test is perfectly reliable i.e., Tn “ 1 
this expression reduces to (6) and that os ru decreases the probable 
error increases. With the Terman material ri8= 0.91 and cr* = 16.8, 
P.B.ji/^ = 1.44 and P.B.i/^_j_j = 1.47 so that the contribution of the 
error 5 appears to be relatively slight. With a reliability coefficient of 
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0.8 and = o, — 1.1a. Thus with ordinary tests a 

difference of about 10 per cent may bo expected when response as well 
as sampling error is taken into account in interpreting the significance 
of the mean. 

The Importance of Errors in Interpreting Results 

At the risk of repetition it may be well to point out in some detail 
the importance of errors of the types analyzed in the interpretation 
of educational measurements. Few workers with tests view the score 
of a pupil from a single trial of atestwithany great degree of confidence, 
but until recently they have not had adequate means by which the 
degree of reservation could be expressed in numerical terms. It is a 
rather enlightening thing for anyone who struggled through the Army 
teat to know that on repetition it is an even chance that he would have 
scored 16 points higher (or lower). Where standardized tests are 
used to measure individual progress during remedial instruction it is 
also important to know whether or not the gain of a particular pupil 
is significant. Formul© (1), (2), and (3) help to answer thesequesfcions. 

In the case of experimental work it is necessary to have satisfactory 
means for testing the difference between averages. For example o 
common procedure in evaluating a method of instruction is to equate 
the practice and control groups at the beginning of* the experiment by 
the use of tests. At the end of the training period similar tests are 
administered and the difference in average gains by the respective 
groups taken as a measure of the superiority of the instructional 
method. Unfortunately such differences are usually slight and need to 
be interpreted with great care. They may be assignable to fluctuations 
in sampling, to variation in response, or to both and possibly other such 
errors. Formula (6) is therefore a useful device for problems of this 
type. 

An examination of equation (6) indicates that the reliability coeffi¬ 
cient of the test is required. This may be obtained by repeating the 
test after a short interval either at the beginning or at the end of the 
experiment. As a supposititious example let us assume that the mean 
gain for the practice group is 20 ± 2, while that for the control group 
is 14 ± 3, the probable errors being obtained by formula (6). The 
di fference between the means may then be written 20 — 14 = 6± 
.y/ 2* -I- 3* = 6 ± 3.6. In such a case the difference would not be con¬ 
sidered significant inasmuch as it is less than twice its probable error. 
Such an experiment is therefore inconclusive. 
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Relation or Scobing Error to the Scaling of Product Tests 

The neglect of scoring orror has led to much unnecessary refinement 
in the scaling of product tests. Scale specimens are frequently 
graduated to the second decimal place in P.E. scale units whereas the 
probable error P.Kx^ for several scorers is likely to be greater than a 
scale unit. Assuming that teachers could he so trained as to reduce 
P.E.;t^ on a composition such as Trabue G-3 cited above, say from 1.73 
to 0.5, it is questionable whethei* it pays to express the values of the 
scale specimens beyond the nearest scale unit. It is surely confusing 
to express them in units finer than can be discriminated in scoring. 

Relation op Response Error to the Scaling of Performance 

Tests 

A common method of scaling performance tests is to convert the 
point score or number of correct response into scale score expressed 
in units of group variability. The eonvei-sion may he accomplished 
for each item individually or by using the total point score as suggested 
by McCall and others. All of the methods are essentially the same 
as that employed by Pearson in his classic study of intelligence. The 
integral of the normal curve is employed as the index variable. The 
advantages of such scaling are that scores are expressed in comparable 
units from a supposedly suitable zero point. The fundamental assump¬ 
tion is that degree of difficulty is a measure of any ability, and this is of 
course very questionable. Nevertheless it is probably the best single 
objective indication that we have. 

If such sealing be abandoned in favor of crude point score there will 
be a loss of the difficulty unit and at the same time of the zero point. 
It is worth while to raise the question now as to whether those losses 
are irreparable for the type of measurement which it is possible to 
achieve with performance test. 

In standardizing a set of 40 questions in Fi-ench grammar the writer 
had occasion to study the desii’ability of weighting the items in various 
ways. The first procedure was to weight each question, on the basis 
of the responses of some 300 pupils. The correlation between simple 
point score and this refined scale score was then worked out and found 
to be over 0.99. A second plan was to stale the total score according 
to the method of McCall.^ Again the correlation between weighted 

^ McCall, W. A.: A Proposed Uniform Method of Scale Construction. Teachers 
College Record, Vol. XXII, January, 1921, pp, 31-61. 
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and unweighted score was over 0.99. The reliability coefficient of the 
test was found to be about 0.90 using the same group. Similar 
results have been found by Charters, Douglass, Monroe and others who 
have decided in some cases to drop weights on their tests on the basis of 
these findings. The issue is essentially this; If the correlation be¬ 
tween weighted and unweighted score is considerably higher than the 
reliability coefficient of the test, does it pay to use the refined weighted 
units? As in the case of the product scale such weighting may be an 
elaborate and unnecessary refinement. 

As a further check on the above studies in weighting, different 
types of material varying in length and difficulty need to be examined. 
The Hotz weighted algebra problems were studied by tire same method. 
For a series of five or six problems varying considerably in weight, 
the reliability coefficients were found to bo approximately 0.7 and the 
correlation between point and scale score about 0.9. With arithmetic 
problems and intolUgenco components corresponding values of 0.8 
and 0.95 were found. In no case did the reliability coefficient equal 
the correlation between weighted and unweighted scores. This latter 
correlation may be made to run higher than 0.98 by suitable graduation 
and lengthening of the test material. With linear regression this 
means that 

Scale score = a (point score) + h where a and h are constants i.e., 
scale score is approximately a linear transposition of point score which 
amounts to a magnification and shift of origin. The magnification 
if desired can be accomplished by much simpler means e.g., by multi¬ 
plying each score by a constant, but there is no gain in accuracy by 
using such units. 

By using the point score the origin is ”no score made” instead of 
the theoretical zero point, “just no ability” or difficulty. This loss of 
the theoretical zero point docs not seem to the writer to be a serious 
one, In the first place quite different zero points arc obtained by 
different methods of scaling e.g,, by individual questions or by total 
score. The method of determination is thus arbitrary. Furthermore 
such a zero point is always a function of the difficulty of the particular 
material used in the test. If more difficult material is used the zero 
point shifts. The interpretation, “just no ability” is thus a relative 
matter. 

The chief reason for wanting such a zero point is that one may be 
able to say “John has twice as much ability as Henry.” At first 
sight this looks like an interesting and valuable comparison to make, 
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"but practically it is of little value, and since it is always made relative 
to the particular material of the scale may be very misleading. Sup¬ 
posing, however, that th^ ability of the two boys has been so expressed, 
what advantage has such a comparison over that obtained by difference 
in score from any referefice point? In temperature we measure from 
the arbitrary point zero degrees Parenheit. Clearly 20°P. is not twice 
as hot as 10®F., yet the comparison has just as much meaning for the 
purpose for which such measurements are made as if absolute zero had 
been employed. There seems to be no good reason why the value *'no 
point score” may not replace the theoretical and more nebulous zero 
points from scaling without serious loss in interpreting results. If 
another reference point is desired for refined comparison with different 
tests, the mean of the respective distributions is clearly the most 
stable and useful. 

The determination of difficulty values is often useful in graduating 
the test material to furnish compoi’ablo sets and to arrange the items 
so as to insure smooth progress by the pupil in taking the test. Tlie 
Henman French Tests furnish an illustration. In deteimining ratings, 
however, point score may be substituted (or refined scale score if the 
material is sufficiently long and well graded. This simpler procedure 
will insure a precision in harmony with the possible accuracy of such 
measurements, and will be as practically useful as if the theoretical 
zero point had been employed. 



an INTEEPRETA.TION OP LAY ATTITUDES TOWARD 
INTELLIGENCE TESTS 

DONALD A. LAIRD 
University of Wyoming 

In a recent paper Kriight* pi*esents the results of applying intelligence 
tests to a group of teachers who were given the privilege of writing 
their names on the teat papers or not, as they might choose. He found 
significant differences in the intelligence scores of the groups thus 
separated. On the whole the group that did not sign the test papers 
scored consistently lower. 

The present communication is concerned with the attitudes of a 
group of 55 students in elementary psychology toward the Thorndike 


Table I 


Score interval 

Frequency 

For 

Against 

100-104 

1 

^ 0 

96- 90 

2 

0 

90- 94 

1 

1 

86- 89 

4 

0 

80- 84 

2 

0 

76- 70 

6 1 

2 

70->74 

3 1 

6 

66- 69 

6 

1 

60- 64 

3 

1 

66- 69 

2 

3 

60- 64 

• 1 

6 

46- 49 

1 

2 

40- 46 

0 

2 

35- 39 

0 

1 

Totals. 

30 

25 


Intelligence Examination for high school graduates. These students 
were freshmen, who had taken tliis intelligence test as a routine part 
of their admission to the university. They were given a written 

‘Knight, F. B.: The Significance of TJnwiliingncas to bo Tested. Journal 
Applied Psychology, 1022, Vol. VI, pp. 211-213. 
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assignment of writing seriously of this test, weighing everything 
carefully and finally concluding as to whether they were “for” or 
“against” the test. This was assigned and completed before they 
knew what their scores were. 

The papers wore soiled into two groups, one Jor and the other 
against this test. Then the composite score made by each student 
was marked on the paper and a distribution arranged by groups 
according to the score, using 6-poitit inteiwals. The results are given 
in the following table: 

The opinions against the teat were voiced by those who did'not 
do well on the test, although at the time of writing their opinions 

_Tor 

- 



they had no means of comparing their scores with the scofes made by 
others. Thus, although they were instructed to list all the pros and 
cons in their paper before even trying to dccidef or themselves what they 
should think about the test, we find cropping out the unconscious 
realization of their mediocrity in the weighed opinions. 

A significant feature appeai-s in these data as graphed in Chart I, 
which is a surface of frequency by the two groups. It will be noted 
that the frequency surface of the “for” group resembles, as closely 
as one would anticipate with 55 cases, a normal distribution curve. 
The other curve, however, is distinctly bi-modal. These few students 
of higher intelligence who are grouped with those of lower intelligence 
in their opinions do not controvert the generalization which has just 
been made. For instance the three high men in the “against” group 
comprise the freshman debating team. The opinions against the 
intelligence test in tho students with highei* scores have probably 
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been reached in .the light of reason, move nearly pure and simple, than 
is the case with the lower scoring membere of the “against “ group. 

Another approach to this problem was obtained in recording 
the degree of cooperation of each of these same students in a test 
designed to see if professors could tell the intelligent students by the 
pictures of the latter. The students were requested to hand in to 
their instructor a recent photograph or snap shot. One week later 
a reminder was given of these promised pictures. A week from the 
date of this reminder another request was given. The returns, again 
grouped by score interval, were as follows: 


Tadi/B II 



Pictures in: 

Not 
yet in 

Score 

interval 

First week 

Second week 

Third week 

100-104 

1 




96- 99 

1 

1 1 



90- 94 


1 1 



85- 89 

a 

1 

1 


so- 84 

1 


1 


76- 79 

2 

3 

2 


70- 74 

3 

3 

2 

1 

66- 09 

2 

1 

2 

L 

00- 64 

1 


1 

2 

65- 69 

1 

1 


3 

50- 64 

2 

1 

1 

3 

46- 49 

1 



2 

40- 44 




2 

36- 39 



1 



These results confirm those reported in the first part of this note 
and the statements of Knight. 

This is but another example to the many already advanced 
regarding the dominance of reasoning by personal motives rather 
than logical principles. Opposition to intelligence testing may arise 
from well grounded arguments, or it may arise, as the present com¬ 
munication shows, from feeling that the score may be low or the tost 
embarrassing. 










COMPARISON OP AMERICAN AND FOREIGN CHIL¬ 
DREN ON INTELLIGENCE TESTS^ 

RUDOLF PINTNEE 
Teachers College, Columbia University 

There seems to be much difference of opinion at the present time 
among psychologists interested in intelligence teats aa to the validity 
of the conventional verbal group test as a measure of intelligence for 
foreign children. Some are inclined to beUeve that such tests give an 
accurate rating of our foreign children and that their reputed language 
handicap is itself an index of lack of intelligence. In this connec¬ 
tion, Young^ has shown that correlations of intelligence ratings with 
teachers^ estimates and school work generally run higher for a verbal 
than for a non-verbal test. But we should not forget that a teacher’s 
estimate of a child’s intelligence will unquestionably be influenced by 
the child’s ability to use the English language, and, of course, all the 
child’s school work is conditioned by his ability to understand and 
make use of English. It may be true, therefore, that for purposes of 
classification a verbal test is as good aa anon-verbal, because ability 
to get on in school requires the use of the English language. If, how¬ 
ever, the school wishes to select the brighter foreign children for 
special work in English, a verbal test may not be so good. 

The question of prognosis value for school purposes must not be 
confused with the question of the absolute intelligence of different 
racial groups. It seems to the writer that non-verbal tests alone are 
adequate for this purpose. It is inconceivable that children living 
in an English-speaking environment, hearing, speaking, reading 
nothing but English should not have a distinct advantage in tests 
requiring the finding of opposites of words, the hunting for an appro¬ 
priate analogy, the filling in of an uncompleted sentence, and the like, as 
compared with children who hear a foreign language at home and in 
many coses are required to communicate in a foreign language to some 
people in their environment. Such contrasting groups are very far 
from having had equal previous practice on the elements which go to 
make up the usual verbal test. 

' The writer wishes here to acknowledge the help of Mrs. A. H. Talbot in gath¬ 
ering the data necessary for this article, and in making the necessary tabulations. 

* Young, K.: “Mental Differencea in Certain Immigrant Groups.” Univ. of 
Oregon, Publication No, 11, July, 1912. 
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In thia connection the folio-wing data gathered in a New York City 
school may be of interest. ^ The children in the third and fourth grades 
were all given the National Intelligence Test, Scale A, Form 1, and the 
Pintner Non-language Test. The distribution curves for both tests 
show much the same type of distribution with no zero scores and no 
perfect scores. All scores were then converted into mental ages and 
T&bh I shows the percentage distribution of mental ages for the various 
nationality groups and for all groups combined. The American 
children wero largely of Irish descent. Under German were included 
all children whose parents were born in Germany or Austria (the 
former Austrian Empire) and, therefore, in this group there are a 
number of Slavic nationality, judging by the family name. The 
Polish and so-called German groups are small and of little consequence. 

The median mental age for the total group shows a higher mental 
age on the Non-language as compared with the National, 9 years, 4 
months against 8 years, 9 months. This may mean that the children 
on whom the norms for the National were based were in general slightly 
superior to those who were used in the standardization of the Non¬ 
language or else that the children in this particular school were in gen¬ 
eral somewhat slightly inferior in such verbal ability as is tested by the 
National Intelligence Test. This superiority on the Non-languoge 
Test is true not only of the foreign groups, as one might expect, but also 
of the American group where we have a median mental age of 9 years, 
4 months on the Non-language and 9 years, 0 months on the National. 
Comparing the separate Nationality groups we note that the medians 
on the Non-language for the Polish and Germans are above the median 
for the Americans whereas the median for the Italians is below. For all 
the foreign children combined the median is the same as for the 
American. On the National Test all the medians fallbelowthe Ameri¬ 
can median. A similar relationship holds in both, tests for the upper 
and lower quartile points. 

Our best comparison of the tests can be obtained by a study of the 
percentage of any foreign group reaching or exceeding the median of 
the American group. These percentt^es are as follows: 

* The writer irishes here to thank Miss Martha Wilson, Principal of Public 
School 127, Manhattan, for her assistance in obtaining the necessary information 
as to the nationality of the children, and also for her kindness and cooperation 
while the testa were being given in her school. 
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Comparison of American and Foreign Children 

Pei'centage of foreign group reaching or exceeding Median Mental 
Age of American group on tests: 

National 

'NoH'iiANaiiAoa IntbiiLiobncb 


Italian. 43 36 

Polish. 61 41 

Gorman.. — 62 30 

Total foreign. 60 37 


Here we sec that there is no difference between the foreign group 
as a whole and the American group on the Nondanguage Test. The 
curves of distribution are practically identical, as can also be seen from 
Table I, where the medians and Q's are all the same. On the National 
Test, however, the difference between the American and foreign group 
is quite marked, Only 37 per cent of the latter reach the median of 
the American group. The two groups therefore, are markedly differ¬ 
entiated on this test, although the overlapping is still considerable, 
When we examine the different foreign groups, we see that the Italian 
group, which is large enough to afford a fair comparison with the 
American group, falls below the American on both the Non-language 
and the National Tests. The difference between the two groups on 
the Non-language Test is not very great but there is still a difference. 
This seems to be in agreement with the majority of studies of this 
national group. All reports indicate the inferiority of the Italians on 
all kinds of intelligence tests, but the writer is inclined to believe that 
the discrepancy between the groups as usually shown by means of 
verbal tests over-emphasizes greatly the intelligence difference between 
Italians and Americans. The present data, although slight, support 
the previous results reported by the writer^ and they would seem to him 
to indicate caution in drawing conclusions as to the intelligence of 
foreign children when tested solely by means of tests which presuppose 
the understanding or reading of the English language. 

1 Pintner, R., and Keller, R. Intelligence Teste of Foreign Childi’en. J. of 
Ed. Psych., Vol. XIII, No. 4, April, 1922. pp, 214^220. 







MENTAL AGE EQUIVALENTS FOB A GROUP OP NON¬ 
READING TESTS OP THE HERRING REVISION 
OP THE BINET-SIMON TESTS 

CHARLES E. WILNER 

Roseftrch Aasiattot, BureR\i of EducatioTiftlB-eseMch, 

Bloomsburg {Peiinsylv&nia) State Normal School 

Despite tho fact that the Herring Revision was intended primarily 
for the classification of normal children, it has come into some use in 
the psychological clinic for the purpose of rating defectives. Dr. 
Grace H. Kent, psychologist of the Worcester State Hospital, states 
that the usefulness of the Herring Revision for this purpose is lessened 
because of the number of tests which require reading ability on the 
part of the eubjeot, since children who can read with fair fluency are 
rarely sent to the clinic. Miss Kent finds, however, that 16 of the 
non-reading tests of the Herring Revision are well suited to the work 
of the psychological clinic, and it is at her suggestion that the attempt 
has been made to obtain mental age equivalents of scores in these tests 
alone. The 16 teats selected by her are Nos. 1,5,7,8,9,12,14,18,19, 
24, 25, 26, 31, 32, 33, 34. 

There were available for the derivation of mental age equivalents 
for the Kent Group, records of 370 persons who had taken both the 
Stanford and the Herring Revisions. Of these, 164 were those used 
in the original standardization of this Revision. This group included 
children from the Garden City public schools, Scavboro School, and 
Letchworth Village Institution for the Feeble-minded. The examiners 
were Miss Grace Taylor, Raymond H. Tranzen and John P. Herring. 
The other 116 examinees were 12-year-old children from the public 
schools of Bloomsburg, Pennsylvania. All 12-year-oId8 in this public 
school system, except a few who were absent when the examinatiouB 
were given, were included in this group. The examiner was Mrs. 
Marjorie H. ‘Wilner, 

The norms were derived by equating the decile points of the dis¬ 
tribution of Scores in the Kent Group with the corresponding decile 
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points of the distribution of Stanford mental ages. These decile points 
were as follows: 

Pbqilb 
1 
2 

3 

4 
8 
G 

7 

8 
9 

The resulting relation line was then smoothed by talcing as the true 
value of each step, the arithmetic mean of its value and the values of 
the 6 successive steps each side of it. Values for points below the first 
decile and above tho ninth decile were found by rectilinear extrapola¬ 
tion beyond these points. This gave the series of mental age 
equivalents in the following table: 

Mental Aqb Equivalents 


Kent Group, Herring Revmon of Bmet-Simon Tests 


Soord 

MA 

Score 

MA 

Score 

MA 

Score 

MA 

Score 

MA 

1 


17 

63 


74 

40 

95 

66 

136 

2 


18 

65 


76 

60 

97 

66 


3 

35 

19 

66 

36 

77 

81 

98 

67 

143 

4 

37 

20 

57 

36 

78 

62 

100 

68 

147 

6 

38 

21 

68 

37 

79 

63 

102 

09 

150 

6 

39 

22 

60 

38 

81 

54 

104 

70 

165 

7 

40 

23 

61 

39 

82 

66 

107 

71 

161 

8 

42 

24 

62 

40 

83 

56 

109 

72 

107 

g 

43 

26 

64 

41 

86 

67 

112 

73 

172 

10 

44 

26 

66 

42 

86 

68 

114 

74 

178 

11 

45 

27 

66 

43 

87 

59 

117 

75 

184 

12 

47 

28 

68 

44 

89 

60 

120 

76 


13 

48 

29 

69 

46 

90 

61 

123 

77 

196 

14 

49 

30 

70 

46 

91 

62 

126 

78* 

202 

18 

61 

31 

72 

47 

02 

63 

130 

79 

208 

16 

62 

32 

73 

48 

94 

64 

133 

80 

214 


Stanfoid MA's HanniNo (ICbkt) Scores 


82 

39 

87 

42.71 

97 

51.25 

118.6 

69.8 

131.5 

03.68 

138.33 

65.82 

147.67 

68.33 

165.5 

70.89 

173 

73.43 
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A comparison of the mental ages derived frona the Kent group 
alone, with the Stanford mental ages showed a coefficient of correla¬ 
tion of 0.9627 (Pearson product moment; data grouped in class inter¬ 
vals of 5). The mean of the Hemng-Kent mental ages was 127.05 
months, of the Stanford 127.16. The SD of the Herring-Kent MA’s 
was 35.36 months, of the Stanford 35.30 months. The coefficient of 
correlation between the Kent Group and Group E, of the Herring was 
0.9670. The Mean MA of Group E was 124.95; the SD 36.05. (In all 
cases above, n - 270.) 

From these data it is concluded that: 

1. Certain non-reading tests of the Herring Revision may be used 
alone for the purpose of estimating a mental age. 

2. The mental ages derived from these tests will have the same 
meaning as, and be comparable with mental ages derived from tbo 
complete Hemng Revision or from the Stanford Revision. 



COMMUNICATIONS AND DISCUSSIONS 


To the Editors of the Journal of Educational Psychology: 

I have just read with great interest the article by Pena Stebbins 
and L. A. Pechstein in the October number of your journal. I think the 
main theses and conclusions of the authors are very true and worthy 
of emphasis. However I deprecate for two reasons the appearance 
at this time in your journal of the table on page 388. My most seri¬ 
ous objection is that this table is hosed upon provisional norms origi- 
tially furnished by the National Research Council when the National 
Intelligence Tests were first published, but which were superseded in 
the National Intelligence Tests Manual, 1921 Revision. I believe 
that sufficient harm haa already been done by persons using the piovi- 
sional norma given in the Manual for 1920. The later Manual shows 
that the old norms not only were too high for all ages, but were particu¬ 
larly high for the younger children. The use of such inexact norms 
results both in the lowering of the mental ages found for all children, 
and in especially penalizing the younger ones. Deductions based 
upon such results lead to false conclusions. IQ’e are too low, AQ's are 
too high, and teachers of older children suffer by comparison with those 
of younger children. These facts will explain some of the findings hi 
the study by Miss Stebbins and Dr. Pechstein, such as those given in 
Table 11, page 394, in which the average IQ’s for 873^ per cent of 
the 16 groups studied are below lOO; in which the AQ’s are all 100 or 
over; and in which the lower grades studied have on the average 
considerably higher AQ's than the higher grades. 

Another objection I have to the table on page 388 is that it inter¬ 
prets the age of eight years as given in the National Intelligence Tests 
Manual as being equivalent to eight years, no months, instead of eight 
years, six months, and so on for each age following. Although the 
1920 Manual does not definitely state what its meaning is in this 
regard, common usage among research workers would lead us to the 
latter interpretation, and the 1921 Manual does definitely state that 
this ia the correct ope. If the findings of Misa Stebbins and Dr. 
Pechstein were levaluated with this latter error of interpretation of 
scores removed, another lowering of their AQ’s would occur. 

Their article is of value in setting forth method. However it is 
illustrative also of the great care which must be taken in the choice and 
interpretation of norms when this method is pursued. I, myself, 
fell into error, through the use of these same norms, in a study I wade 
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of IQ’s and AQ's in January, 1921, before the publication of the revised 
Manual. I should like to save others from similarly misinterpreting 
their data, through the use of provisional norms, which have since been 
superseded. 

Yours truly, 

Kathahine Murdock. 

Halekulenu, Honolulu, T. H. 



NOTE ON THE USE OF SPEARMAN’S PROPHECY 
FORMULA FOR RELIABILITY 

KARL J. HOLZINGBR 


University of Ohiengo 


One of the most important laws which has come to be recognized in 
the preparation of test material is that a long test is in general more 
reliable than a short one. Reliability may be here defined as the con¬ 
sistency with which a test measures what it purports to measure, the 
consistency being indicated by the correlation between two applica¬ 
tions of the same test or of equivalent forms. Spearman* and later 
Brown have expressed the degree of reliability to be expected by length¬ 
ening testa in n formula which may be written 

l)rxx 

where rtttf is the reliability coefficient on pooling N tests of equal 
length and reliability, and fxx the reliability coefficient of the indi¬ 
vidual testa (or average of several). 

When I'xx has been determined, the above formula is a function 
of rtfff and N so that it appears a simple matter to find N for any 
required reliability rxN suppose Txx = 0.6 and we wish the final 
lengthened test to have a reliability of 0.9. The equation then becomes 


0.9 


r+0 fi(iV — 1) ' ^ 90 that it will be necessary 


increase the test to nine times its original length to secure the desired 
reliability of 0.9. It is further evident that when Txx has any value 
except zero that TffN approaches +1 as a limit for 


lim Tnn = lim 
iv * « w • 


rxx 


N 


•^Txx — 


rxx 

N 


+ 1 


This would lead one to expect that by continual lengthening we could 
approach perfect reliability as closely as we please. Experience with 
test and children, however, shows that this is absurd. The law over¬ 
states the reliability to be expected, and it is important to know how 
much, where in the series, and why such overprediction occurs with 
given types of material. 

In the present experiment the reliability coefficients of the ten 
components of the Terman Scale were determined from forms A and 
B on a group of 135 pupils. The results are given in Table I. The 

* Spearman, C.! Correlation of Sums and Differences. Brit. Jr. Psy., Vol. V, 
pp. 491-420. 
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TABLB I. —RSLIADIIilTY CoDPPICIBNTa POR THE TeBMAN ScALB BY COMPONENTS 

AND Total Score 


Componenfi 

Correlation between forms A and B 

1 

.638] 


2 

.809 


3 

.682 

moan of first five correlations = .776 

4 

.900 


5 

.862j 


6 

.482' 


7 

.083 


8 

.630 

mean of last five correlations = .682 

g 

.614 


10 

.702 


Mean. 

.679 


All. 

.916 



i>Ikm 



Fi^e 1.- Theorelical and Ibnierltnenlal RellaMUt-y 
Trends based on the ten Tormon Conmonents. 
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individual components ai-e not equivalent as to material and length, 
but are sufficiently so for present purposes. The mean reliability 
coefficient is 0.679 and the correlation by pooling, 0.916. Turthermoxe 
this last value is greater than for any individual component so that 
there is clear evidence of increased reliability by lengthening of the 
material. 

It is now possible to answer the question as to how much the pre¬ 
diction formula overstates the expectedresultfrom pooling. Substitut¬ 
ing Txx = 0.68 in this equation gives r{io)(io) = 0.96, whereas the 
obtained value is less than 0.92. The overprecliction then amounts to 
about 0.04, a very considerable difference with such high correlation, 

In order to determine whw© the ovei^tatement occurs it is neces¬ 
sary to apply Spearman's formula to successively pooled components. 
Thus the first component of form A is correlated with the first in form 
B , then tests 1 and 2 of A with 1 and 2 of B, and so on until all ten of 
one form have been correlated with the ten of the other form. Theoret¬ 
ical and experimental values may then be compared as various num¬ 
bers of tests are pooled. As a checlc tile components were also 
amalgamated in the reverse order. Table II and the accompanying 
diagram show the results. 


Table II.— THGoaBTicAi. and Experimental Rbliabilitt Coepficients 
Obtained prom Spearman's Foburla and bv Successive Cumulation 
OF TRB Ten Tbbman Components 


Number of testa 

Theoretical 

Order of cumulation. 

cumulated 

value 

Ito 10 

10 to 1 

1 

,68 

.64 

,70 

2 

.83 

.81 

,79 

3 

.87 

.87 

.83 

4 

.90 

.91 

,87 

5 

.92 

.90 

.84 

6 

.93 

.88 

.86 

7 

.94 

.89 

.87 

8 

.94 

.87 

.87 

9 

.96 

.91 

.90 

10 

.96 

.92 

.92 


The table and diagram indicate that for the first order of cumula¬ 
tion the formula gives a good prediction up to four components, but 
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that thereafter considerable overstatement occurs, This might 
he accounted for in part by the fact that the first five individual 
reliability coefficients as seen in Table I are higher than the last five, 
so that the amalgamation of the latter tests might not be expected to 
increase the reliability any further. The reverse cumulation, however, 
shows that rapid increase in reliability for the last four tests is not due 
primarily to high individual coefficients, Furthermore the addition 
of such components as 4 with a reliability of 0.9 does not increase the 
trend appreciably when the amalgamation is made late in the series 
e.g,, in cumulating from 1 to 10 the addition of test 4 raises the trend 
0,04, but in cumulating from 10 to 1 it raises it only 0,01, 

The general result then appeara to be that reliability increases very 
rapidly with the first four or five tests pooled, but increases thereafter 
more slowly than the prediction formula would lead us to expect. 
Moreover the trend is determined chiefly by the number of tests 
cumulated and is not affected appreciably by highly reliable 
components pooled late in the series. 

In order to study the reasons for such over-prediction it is neces¬ 
sary to examine and test the assumptions underlying the proof of the 
formula. This will not be attempted in the present paper, but it may 
be sufficient to note here that Brown omits this equation from the 
recent edition of his text on mental measurements. 



NOTES ON ARTICLES IN EDUCATIONAL 
PSYCHOLOGY IN CURRENT ISSUES OF 
OTHER MAGAZINES 


repouted by cecile colloton 

Dopartment of Ediioational PaycKology, The Liacoln School of Teachers College 
Zntblliqbncb Tests 

Mental Tests as an Aid in the Analysis of Menial ConsHiulion. Harry J. 
Baker. Journal of Applied Psychology, 1922, December, 349-377. Condenflo- 
tion of a Ph.D. disaertatioa. Twenty-six testa of general intelligence and specific 
abilities were administered to 26 high acliool students and 26 college students, 
Eight individual oases are reported in detail and 31 general coneliisions are made. 
Bibliography gives 36 references. 

A Comparwon of Three Tests of “Oeneral Intelligence,’* Morris S. Viteles. 
The Journal of Applied Psychology, 1922, December, 391-402. A study of the 
performance of 69 students of the Wharton School of Finance and Commerce 
of the University of Pennsylvania on the Otis General Intelligence Test, Army 
Alpha, and the Morgan Mental Test. Qro&t variability in the different tests and 
lack of correlation between test results and school grades is noted. 

Freshmen Grades and Mental Tests. W. G. Binnewies. Educational Adminis* 
tration and Suporvision, 1923, March, 161-162. A study of freshman ooUoge 
grades and group IntelHgonoo tests shows a correlation of about 0.4. 

An Initial Inventarn of the MenUd Ca'paeitiss of Primary Children, H. E. 
Vander Zalm. Education, 1923, March, ^0-445. A general discussion of the 
need for olosaiflaation by mental tests in the primaiy grades. Special mention ia 
made of the Detroit First Grade Intolligonco Test. 

£7roup Intelligence ExaminaiionB for Primary Pupils. 0. J. Johnson. Tho 
Journal of Applied Psychology, 1022, December, 403-416. Fart I lists all existing 
primary examinations and compares them in detail as to directions, methods of 
scoring, kinds of tests, eto. Part II described fully the Non-verbal 2 Intelligenoe 
Examination for Primary Pupils. 

Measures of General Intelligence as Indices of /Success in Trade Learning, Carl 
M. Cowdory. Journal of Applied Psychology, 1922, December, 311-330. Reports 
a study of the boys at the Whittier State School, Whittier, California, who arc 
engaged in the learning and performing of trade work. Individual intelligence 
testa and a three years accumulation of rating on. 22 different trade groups moke 
up the data of the study. 

The Relation of Intelligence io Age in Negro Children. Ada Hart Arlitt. The 
Journal of Applied Psychology, 1022, December, 378-384. One hundred and 
eighty negro children of New Orleans and 83 of Philadelphia tested by tlie Stan- 
ford-Binct show that at ages five and six, negroes are superior to whites of the 
some social status, Beyond six, negroes become increasingly inferior with age, 

The Influence of Certain Exercises in Silent Reading on Scores in the Otis Group 
InteUigence Test. Wendell White. Educational Administration and Supervision, 

300 



Notes on Articles in Educational Psychology 307 

1023, March, 179-182. Significant inorooaes in test scorea are obtained after 
drill on special reading oxeroiaea designed solely to develop speed in reading.' 

ImproucTTienl in Teachers' EsKwales of IntdLigence. W. D. Buohanan. The 
Elementary School Journal, 1923, March, fi42-549, Training teachers to ignore 
school achievement and character traits in estimating intelligence brings about a 
high correlation between teacher’s ratings and test scores. 

A Criterion of the Quality of Teaching. Dudley W. Willard, and Curtis T. 
Williams. Educational Administration and Supervision, 1923, March, 147-159. 
A comparison of teachers’ marks and tlie test scores of 236 eighth grade and high 
school pupils of Kent, Washington, on the Terman Group Test of Mental Ability. 
Interviews with teaoliera concerning basis of grading are reported in detail, 

Intelligence Levels among State Normal School Graduates, Frederick L, Whit¬ 
ney. Journal of Educational Research, 1023, March, 229-236. Studies of tho 
intelligence of normal school students and graduates based on the Army Alpha 
show favorable comparisons with other college students and professional and 
oooupational groups. 

Eddoationai, Tests 

The Development and Comparative Values of Composition Scales. Earl Hudel- 
Bon. Tho English Journal, 1923, March, 163-168. Lists and describes the 
various composition scales devised from 1903 to the present time. 

A Comparative Study of the Vocabulary Content of Certain Standard Reading 
H. L. Ballinger. The Elementory School Journal, 1923, March, 622-634. 
A comparison of tho words in 14 well known reading tests, the Thorndike Word 
List, the Horn Word List, and the vocabulary content of 80 first, second and 
third grade readers. Eleven words are common to the 14 tests. Of the 2030 
words in the 14 tests, 1100 appear only once in either the Thorndike or the Horn 
list. 

On Improving Algebra Tests. David Eugene Smith, Teachers College Record, 
1023, March, 87-94. Examples from various algebra tests quoted and criticised. 

An Experiment to Determine the Effectiveness of Practice Tests in Teaching 
Beginning Reading. Nila Banton Smith. Journal of Educational Research, 
1923, March, 213-228. A description of a new method for teaching beginning 
reading used in Detroit with great success. 

Some Umitalions of Educational Tests. V. A. C. Henmon. Journal of Edu¬ 
cational Research, 1923, March, 186-198. Comparative studies of different 
teats in American history, algebra, and reading show the unreliability of the tests 
as a measure of individual achievement. 

Spelling Age Computed from the Score on Fifty Per Cent Lists. Walter E. 
Morgan, Journal, of Educational Rcseoroh, 1028, March, 236-243. Describes 
a technique for computing spelling age in Grades II to Vlll using the Buoking- 
ham-Ayies Spelling Scale. 

Mental ajjd Educational MBABunEWENTS 

The Educational Significance of Mental Tests. B, H. Bode. Journal of Edu¬ 
cational Research, 1923, February, 91-99. Democratic education must build 
upon tho various interests and abilities disclosed by mental tests. A super¬ 
structure of common faith and common knowledge. 

A Study of the Use of the Stanford Revision of tixe Binel-Simon Test as a Guide 
to iSrieeWon of High School Courses. Sara E. Weisman. Journal of Educational 
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Besoaroh, 1923, Februnvy, 137-144. Beports the achievement of 30 pupils in 
courses selected for them on the basis of the abilities disclosed by the Binob test.. 
Thirteen case studies are given. 

Training Teach&n for Mental Testing, in Oakland, California. Virgil E. Dick¬ 
son, and Eliae H. Martens. Journal of Educational Eesoarch, 1923, February, 
100-108. DiscuBsea the Oakland program for training teachers for mental test¬ 
ing. Describes the methods of training and gives detailed data on the results, 

Siudenia gn the Basis of Native Capadly and Accomplishment. Ira 
A. Flinner. Educational Administration and Supervision, 1923, February, 
87-08. The use of group tests, teaehers’ estimates, and individual tests in classi¬ 
fying boys in a college preparatory school and in comparing native capacity and 
accomplishment. 

The Education of Menial Oefeclwes in the Public Schools of Seattle. Harlan C. 
Hines. School and Society, 1023, February 24, 216-221. How children of IQ's 
from 56 to 80 ate trained along mduatrial lines. Details of olossiiicatiou, olaa& 
work, and follow-up work are given illustrated by case studies. 

Teaching and Pdlowing-up Supernomal Children in a Small Public School, 
Julia F. ICeaney. Journal Educational Research, 1923, February, 146-148. 
Tells how the curriculum was enriched for a group of 27 boys of high IQ in a New 
Yotk public school. 

7« SGienlifio Vocational Guidance Possiblef John M. Brewer. School and 
Society, 1923, March 10,292-206. Disiiusses what use has been made of soiontiRo 
method in the dcid of vocational guidance and what remains to bo done, 

A Few Suggestions for Informal Testing in Oeography. Edith P. Parker, The 
Elementary School Journal, 1923, February, 444-447. Testing childion'a knowl¬ 
edge of geographic principles by new pictures, maps and reading references. 

The ValidUy of AfffAmelfcal-rcasontng Tests. R. V. Hunldns and F. S, Breed. 
The Elementary School Journal, 1923, February, 463-466. Reports a study of 
seven aritbmetio teets in general use. Data secured from 127 children in Grades 
V to VUI, Hot Springs, So. Dakota. Conclusions show Stone Reasoning Test 
to be moat valid, Birkingbam's Scale for Problems in Arithmetic too dilBcult. 
Monroe and Stone most useful for diognoeis of individual difhcultics. 

MlBCELLANBOne 

Scientific Tests w Education and Tkdr Vse. George C. ICyte. Educational 
Administration and Supervision, 1923, March, 163-172. How the many prob¬ 
lems of promotion, classification, and diagnosis can be solved by the classroom 
teacher through the use of mental ond educational tests. 

Individual Injustice and Gtmsing m Truo-faUe .Exa^ninof'ion. J. Crosby 
Chapman. Journal of Applied Psychology, 1922, December, 342-348. A cau¬ 
tion against too much dependence on the operation of chance in true-false examina¬ 
tions. Pour tables present interesting data on scores in liypothetioal examinations 
ranging from 30 to 90 items. 

, When. Children Read for Pun. Jmuiy Lind Green. School and Society, 1933, 

, April 7, 390-392. Reports an experiment to determine how reading for fun can 
be affected by direct training in ohoico of material. 

The Growth of Children as Influenced by Erwironmsntal and Hereditary Condi- 
iionc. Franz Boas. School and Society, 1923, March 17, 306-308. Growth 
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curves are influenced by different social environmonfca but the effect of tlie heredi¬ 
tary growth curve is greater. 

Menial Fatigue of Mixed and Full Blood Indians. Thomas R. Garth. Journal 
of Applied Psychology, 1922, December, 331-^41. An experiment conducted 
wth 106 full blood Indians end 80 of mixed blood shows that the full bloods are 
more willing to put forth effort ftnd resist fatigue more successfully. 

Education for Democracy. Alma Pnsoholl. Educational Review, 1023, 
April, 226-227. Discusses needed changes in tho public school system to pro¬ 
vide for individual abilities and higher ethical ideals. 

A Gonirolled Expenmeni to Determine Ike Extent to Which Latin can Function 
in the Spelling of English Words. Warren W. Coxe. Journal of Educational 
Research, 1923, March, 244-247. Describes one of the investigations being 
carried on by the Advisory Committee of the American Classical League. 

The Construction and Interpreiaiion of Correlation Tables. E. L. Thorndike. 
Journal of Educational Research, 1023, March, 109-212. Explanation and 
illustration of a method of making correlation tables from given hypotheses about 
the causes producing correlation. 

Home Conditions of Study and Pupil—Aililude toward School Work A. 
Sampling. E. T. Clayton. School andSooiety, 1923, February 24, 221-224, Re¬ 
sults of a questionnaire answered by 645 high school pupils of Concord, 
N. H. Twelve tables present detailed data. No definite tendencies are revealed 
by the study. Need for similar studies in other cities. 

The Meaning of Behavioristic Psychology for Education. J. Herbert Black- 
hurst. Educational Review, 1923, March, 1^160. Stresses the importance of 
building up desirable reaotion patterns early in the life of the child. 

Problems of College Admissum. Alexander C. Roberts. School and Society, 
1923, March 3, 246-252. Will the so-called two-thirds rule be an adequate 
and just scheme of admission to college? A study of the high school marks and 
university records of 1129 individuals answers this question and raises others. 

The Progress of Kindergarten Pupils in the Elementary Grades. W. J. Peters. 
Journal of Educational Research, 1023, February, 117-126. Reports a study of 
the school progress of 374 fifth-grade children exactly half of whom had attended 
kindergarten before entering first grade. IGndcrgoi'ten expedites school life, 

The Beading Vocabularies of Third-grade Children. C. A. Gregory. Journal 
of Educational Research, 1923, February, 127-131. Reports the results of a study 
of the minimum requirements of the state course of study of Oregon to deter¬ 
mine the minimum reading vocabularies of third-grade children. 6000-0000 
words a conservative estimate. 

A Basic List of Phonics for Grades 1 and II. Mabel Vogel, Emma Jaycox, and 
Carlofcon W. Wnshburne. Tho Elementary School Journal, 1923, February, 
436-443, A study of the frequency of occurrence of the phonograms in certain 
vocabulary counts and readers with reference to the determination of a minimal 
content of phonics for first and second grade. 

Phonics and No Phonics. Lillian Beatrice Currier. Elementary School 
Journal, 1023, February, 448-452. A report of ft 5-yeai’ experiment eventuating 
in 6 significant conclusions or recommendations. 

A Textbook Score Card. E. M. Otis. Journal of Educational Research, 1923, 
February, 132-136! Describes a score card designed for judging the value of 
informational text books. A list of '‘standards” define each item on the card. 



NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 
EDUCATION 


CONDUCTED BY LAURA ZIRBES^ 

1. An Iniroduciion to Psychology .—^To the student of psychology 
who is not afra'id of a fairly difficult book, this new volume® by 
McDougall can be recommended. It is designed to introduce the 
student to his science, but it is doubtful whether any teacher would 
care to begin with a book so polemical in nature, a book that presents 
so much of the theoretical background of psychology and which omits 
a great deal of the content which, rightly or wrongly, is at present sup¬ 
posed to eonsUtuto a course in psychology. To the more advanced 
student, rather than to the beginner, the book will prove valuable. It 
presents a well-reasoned account of a “purposive'^ psychology and 
does not hesitate to challenge the structural and behavioristic psy¬ 
chologies in favor at the present time. 

The book is excellently written. It does not begin with a long 
account of the nervous system, for which one reviewer at least ie 
thankful. The approach of the author is from the study of the be¬ 
havior, of the lower animals up to a study of human behavior. 
Instinct occupies, of course, a dominant role. Habit receives some 
attention. Although there is no application of any of these topics to 
educational theory or practice, the serious student of educational 
psychology will read with interest and profit this well-knit presentation 
of a system of psychology. - it. P. 

2. The Megsuremnt of Teachers.—To assert that age, experience, 
quality of handwriting, intelligence as measured by tests and normal 
school scholarship are singly of importance in predicting the degree of 
success of grade school teachers is to argue from opinion rather than 
knowledge. Dr. Knight’ in a laborious study shows that none of 
these correlate above 0.15 with teaching ability of 153 teachers. 

‘All unsigned reviews are prepared by Laura Zirbea. 

“McDougall, William: "Outline of Psychology," New York: Scribners, 
1923, pp. 469. Price $2.60. 

“Knight, F. B,; Qualities Related to Success ia Teaching, .T. 0. Contribulions 
to Bduoalion, No. 120, New York, 1922, pp. 97. 
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^ount of study while in service and ability to pass a professional 
teaching teat correlate with teaching success in the neighborhood of 
0.33 and 0.60 respectively. By partial correlation analysis it is shown 
that the latter is the sine qua non of teaching success. This the author 
believes is the direct result of the teacher’s interest in and application 
to her job, and may be largely independent of amount of experience, 
age, etc. Salary, when based on merit rather than experience, also 
correlates in the neighborhood of 0.4 with teacher merit; the author 
does not say which is the cart and which the horse. 

Of immense practical value is the conclusion that all teacher rating 
schemes, based on analyzed traits rather than "general merit” are 
subject to the "halo” effect—^the rater gets a general idea of the person 
rated and then allows his general idea to affect bis consideration of the 
separate traits into which he might seek to analyze teaching ability. 
In a New York City school district rating score card peneral inidhc- 
iual capacity correlated with voice to the extent of 0.621 The conclusion 
might be drawn that score cards are at least highly wasteful of time if 
not substantially useless, 

It would seem to the reviewer that a composite of normal school 
guooess measured on common examinations, amount of study while 
teaching, standing on a common professional tost, and salary attained 
in a common school system might be used to give a rather accurate 
composite measure of a teacher’s fitness for promotion. Unfortunately 
the author does not tell us what shall be done with the significant 
’’tests.” A possible distinction between the use of them as prognostic 
tests and as measures of progress might be drawn. The number of 
high school teachers’ records investigated is too small to draw valid 
conclusions, but intelligence appears to be more important than test 
measured professional teaching ability. 

The study, not the first in the field, is an excellent first approach to 
the complicated problem of teacher measurement. 

Herbert A. Toops. 


3. An Experimental Study of Complex Learning. —Using a test 
which embraces the features of the "multiple choice” experiments 
frequently used with animals, a “checker puzzle” and the Tait 
Labyrinth puzzle Haught* has made an experimental and statistical 

‘ Haught, B. F.: The interrdations of Some liiglier Learning Processes. 
Ptychological Monographs, Princeton: The I^ycliologioal Boview Co., 1921, 
PP. 71. 
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analysis of the inten'elafcions of some of the higher learning processes 
and their association with general intelligence. Stanford-Binet 
scores are accepted as the criterion of intelligence—a somewhat ques« 
tionable practice with college students as subjects. The correlations 
of intelligence and the several learning tests are uniformly low for 
reasons not wholly clear inasmuch as the reliability of the learning tests 
are not determined. The intercorrelations of the learning tests are 
also low. Several types of scorea for the learning tests are carefully 
evaluated with reference to the criterion utilized. The author believes 
that the puzzle tests afford a measure of several important mental 
abilities such as to control attention over long periods, to keep the 
goal idea in mind- without confusion, to systematically analyse very 
complex situations and other features of “rational” learning more 
thoroughly than the Binet or other tests of intelligence. Inasmuch os 
thfe intercorrelationa of the several tests do not fall into hieraTohiea 
consistent with the Spearman “general factor” formula the author is 
disposed to think of intelligence not as a single power or quality but 
as “various factors variously grouped for different situations.” On the 
whole this is an admirable study of the higher mental processes em* 
bracing many suggestions Of value to the specialist in psychology and 
mental testing. _ A. J. G. 

4. ■Umarch Work at Vineland.—tiio institution for the feebleminded 
in this country has stimulated so much psychological research as has 
The Th'aining School at Vineland. A long list of books, monographs 
and articles have come from that source. Now Mr. Porteus adds 
another* to the list. This book represents the work he was engaged 
in during the three years of his directorahip of research. Much of it 
has already appeared in various monographs and articles by the authoi’. 
It is well, however, to have it all in one volume, even although the topics 
are very diverse. There are studies dealing with anthropometry, 
intelligence tests and rating scales. Incidentally the author presents 
still another definition of feeblemindedne^, which runs as follows: 
“A feebleminded person is one who by reason of mental defects, other 
than sensory, can not attain to self-management and self-support to 
the degree of social sufficiency.^’ 

The chief anthropometric contribution is a study of brain capacity. 
The author presents data from over 2000 normal cases from ages 7 to 

' Porteus, S. D,: Studies in Mental Deviation. No. 24, Publications of the 
Training School at Vineland, Department of Beaearcli, October, 1922. 
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20. He finds an increase from age to age. He does not, however, 
raise the question of selection in the older ages above the public school, 
and it would be dangerous to accept his adult norms based on univer¬ 
sity students as typical of the population at large. The significance 
of the steady increase in brain capacity is, therefore, as hard to inter¬ 
pret as would be the steady increase in intelligence test scores which 
would be found in the population measured. 

In the chapters dealing with intelligence tests, we have further 
information about the Porteus Maze Tests, which are already well- 
known. The author emphasizes the measure of ‘‘planning capacity” 
which these testa give. They give higher correlation with industrial 
capacity and social adaptability than does the Binet. 

In the field of rating scales, we have a social rating scale and an 
industrial rating scale for the feebleminded. Both of these are rough 
and ready instruments, as the author realizes, but the work is valu¬ 
able and suggestive and will undoubtedly stimulate further research 
in this direction. The versatility of Mr. Porteus is further shown by 
his educational attainments scale for defectives, his form and assem- 
bliog test and his revision of the Stanford-Binet Scale. 

From this brief description of the number of topics reported in the 
book, it can be readily imagined that much of the work is fragmentary 
in character, and from one point of view hardly worthy of being 
incorporated into a book. The only justification for much of it must 
be the hope that it may stimulate others to carry on where the author 
has left off. R. P. 


6. Elemenls of Human Psychology .—One naturally compares this 
new work^ with the well-known and still recent (1919) Human Psy¬ 
chology, by the same author. The purpose of the new and shorter 
text is thus set forth in the preface: “This book was written to meet 
numerous requests for an introductory textbook of psychology based 
on the functions of the nervous system. The standpoint is the same 
as that of ‘Human Psychology,’ which recognizes both the intro¬ 
spective and behavioristic methods. Material has been freely drawn 
from the earlier work, but the arrangement of topics is different and 
the treatment has been simplified. Most of the theoretical discussions 
are omitted and the practical applications of psychology are 
emphasized.” 

' Warren, Howard C.: “Elements of Human Psychology." Boston: Houghton 
Mjfilin, 1922, p. 416. 
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The abridgement, which amounts to about 15 per cent on the whole, 
affects especially the chapters on the nervous system and the senses. 
A few topics, such as memory and the subconscious, are treated at 
considerably greater length than before. The figures are increased in 
number, with the object of clearing up difficult topics. The practical 
exercises are also more numerous, and, in addition, an appendix 
provides a full set of review questions, as well as a few pages of sug¬ 
gestions as to tlie best manner of teaching and studying the subject. 
A novel feature which will be much appreciated is the expansion of the 
index into a glossary, in which moat of the terms used in the text are 
carefully defined, 

The author has gone minutely through the older text, with the 
object of simplifying and illuminating every statement. Scarcely a 
sentence is taken over bodily into the new text; almost always there is 
some change in the direction of adaptation to the needs of the beginner. 
The following short passage, as it appears in the two books, gives 
some idea of the care and skill shown by the author in this difficult 
task of simplification. In the “Human Psychology’’ we read: 
"The inner ear or labyrinth is a very complicated cavity, only part of 
which serves the auditory funotion. The dorsal portion contains the 
semicircular canals and their appendages, which act as receptor for 
the static sense.” In the “Elements,” this becomes: “The inner 
ear or labyrinth is a very complicated cavity, only part of which is 
oonoemed with hearing. The portion toward the back of the head 
contains the semicircular canals, which are receptors for the static 
sense; they- have nothing to do with hearing.” Much more extensive 
alterations than this occur constantly throughout the text. Two 
chapters in the earlier work which, taken together, outline the author’s 
system are, in the new book, combined into a compact view of the 
whole. There are frequent retrospective and anticipatory summaries, 
which serve admirably to keep the reader oriented. The style through¬ 
out is certainly as direct and free from unnecessary difficulties as it 
could well be made." 

As to subject-matter, the new hook, like the old, is characterized 
by catholicity combined with system, When the author says that he 
“recognizes both the introspective and behavioristic methods,” he 
means more than that he is willing to accept particular conclusions 
reached by either method. He means that his system of psychology 
, has a logical place for each method and for the whole positive content 
of both introspective and behavior psychology. 
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The book is essentially a system of psychology. By definition, 
"Human psychology is the science which deals with the interaction 
between man and his environment by means of the nervous system and 
its terminal organs, together with the mental events which accompany 
this interplay.” This interaction of man and his environment con¬ 
sists of responses to stimuli, and each response consists to three princi¬ 
pal parts: Reception of the stimulus by a sense organ, adjustment in 
the nerve centers, and muscular or glandular activity. The behavior- 
ietio method makes its contribution by examining the muscular or 
glandular response in connection with the stimulus, but it does not 
examine the important central proems of adjustment. For informa¬ 
tion on the process of adjustment, we turn to anatomical and physio¬ 
logical study of the nervous system, and also to introspection. The 
conscious experiences that are examined introspectively are corre¬ 
lated with the adjustment processes in the nerve centers. "In other 
words, the psychologist can study his thoughts and memories, his 
perceptions and emotions, in place of the central nerve processes which 
accompany them.” 

The central neural process of adjustment consists of several compo¬ 
nent processes, to each of which corresponds a fundamental mental 
process. The list is as follows: 

Nbnooal PnocGsa Mbntal Proosbs 


Excitation 

Conduction 

Retention 

Fatigue (and frcslincss) 
Collection 
Distribution 
Modification 


Impression 

Suggestion (association) 

Revival 

Attention 

Composition 

Discrimination 

Transformation 


Now what is examined introspectively consists of experiences (or 
mental states) which are built up out of sensations by the processes 
just listed. Experiences differ because they are compounded of 
different sensations. There are three main classes of sensations: 
Those from the external senses, those from the systemic senses (organic 
and pain senses), and those from the motor senses (muscle and static 
senses). Also, there are revived sensations, which are of importance 
only in case of the external senses. That gives four main classes 
of sensory elements out of which experiences are compounded, and, 
to correspond, there are four fundamental classes of experiences: 
Perceptions, composed chiefly of external sensations; images or ideas, 
composed chiefly of revived external sensations; feelings, composed 
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chiefly of systemic sensations; and conations, composed chiefly of 
motor sensations. There are also secondary experiences, composed of 
systemic pZus motor sensations; sentimente, composed of systemic and 
revived external sensations; volitions, composed of motor and revived 
external sensations. Language and thought, which belong together, 
are like volitions in being compounded of motor sensations and revived 
external sensations, but in their case the motor sensations arise in the 
vocal organa in the courae of social communication. 

Prequent repetition of the same sort of experience gives rise to a 
more or less permanent set of the nerve substance, corresponding to 
which is a definite mental attitude. Perceptions and ideations set 
into permanent interest (the cognitive or receptive attitude), feeling sets 
into desire, and conation into attention (considered here as a motor 
attitude). These are the primary attitudes, based upon the funda¬ 
mental' classes of experience. There are also secondary attitudes^ 
based upon the secondary classes of experience. Thus, emotional atti¬ 
tudes, or dispositions (such as cheerfulness, cowardice, malice, loyalty) 
are based upon the repetition of similar emotions, and volitional atti¬ 
tudes, or proclivities (as perseverance and vacillation), are based upon 
repeated volitions. The repetition of similar intellectual processes 
generates such attitudes as the retrospective, the imaginative) the 
judicial, the analytic, and many others. 

“Character arises from the consolidation of attitudes into more 
permanent trends of life.” The consolidation of the intellectual atti¬ 
tudes gives the intellectual phase of chai’acter, the feeling attitudes 
compose temperament, the motor attitudes compose skill, and the 
social attitudes constitute moi'ality. These are the four phases of 
character, and their summation constitutes the personality, “the 
entire mental organization of a human being at any stage of his 
development.” Thus personality is built up from elementary sensa¬ 
tions by a continued summation; and, in the same way, on the side of 
motor response, there is a continued process of organization, from 
reflexes through instinctive acts, learned performances (intelligent 
behavior), and rational action, up to personal control of the entire 
situation with which one is conceraed. 

As the preceding summary dimly suggests, the book represents a 
very determined effort at systematization. It is rather a remark- 
, able performance in that line. In the present immature state of our 
science, to be sure, any thoroughgoing sj^tem of psychology is bound 
to be somewhat personal and arbitrary, and open to cheap and easy 
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criticism. Fov example, the author attempiB to combine structura 
and functional psychology in his system. He first defines the different 
classes of experience in purely structural terms, as, a perception is a 
compound of external sensations. But, as we advance into the chap¬ 
ter on perception, we begin to read of the perception 0 / objects, of weight, 
of spatial relations. Her© then, we are considering perception as a 
function which aocompliahes certain results. Does the original 
structural definition of perception still hold good? No, for it develops 
that motor sensations are as important as any in tho perception of 
weight, size, etc. The structural definition does not hold strictly, if at 
all, as soon as we begin to think in terms of function. Volition, 
similarly, is first defined in a purely structural way, as an experience 
compounded of motor sensations and revived external sensations. 
But another, functional definition is also given, according to which 
volition is ‘'the kind of experience which accompanies ideomotor 
actions.” According to the structural definition, I should experience a 
volition if I chanced to have a visual image while walking or even 
while being passively rotated; but, according to the functional defini¬ 
tion, this experience would not bo a volition, because the motor 
sensations in question are not produced by any motor effects of the 
visual image. 

Regarded as a serious attempt to show that structural psychology 
can be taken over bodily into a psychology that is primarily a study of 
certain functions of the organism, Warren^e system thus leaves con¬ 
siderable room for doubt. As a text for a discussion group, where the 
stress is to be laid upon careful definition, scrutiny of implications, and 
logical system, the present book should serve admirably. Nor is it 
lacking in informational value, nor in practical hints. There are many 
judicious educational suggestions scattered throughout the book. 

R. S. Woodworth. 

Columbia University. 


6. The Technique of Cumculum Consiruciion .—This thorough¬ 
going treatise^ falls into two parts, the firet, a statement of principles, 
with their elaboration into procedures; the second, a compilation of 50 
studies from various sources, illustrating eight school subjects and 
miscellaneous fields which cannot be included under the first classifica¬ 
tion. The book should not only fill a need as a textbook for graduate 

‘Charters, W. W.: "Curriculum Construction.” New York: MacMillan, 
1023, pp. XII + 362. 
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classes. Part I is a handbook of inv^tigational techniques, and is 
permeated with a philosophy whioh every curriculum worker will do 
well to consider in critical comparison with his own or the one to which 
he subscribes. _ 

7. A Brief for the Pre-school Child ,—^The biological and psycholog¬ 
ical significance of the first five years of life are clearly set forth in the 
opening chapter of this book.^ The history of infant and child welfare 
work and of the nursery school movement is traced in its relation to the 
home, the kindergarten and school entrance. 

Educational provision for handicapped children below school age 
is advocated both as a preventive and as a corrective measure. The 
book is full of constructive suggestions for the conservation of child¬ 
hood, the improvement of parental care and educational preparation 
for parenthood. The appendix contains a selected bibliography and 
a wealth of other pertinent data. This book is not a report of Dr. 
Gesell's rosearoh project at the Tale Psycho-Clinic. It is a text 
comparable with Terman'sbook, “The Hygiene of the School Child.” 
Its purpose is to indicate the vital interdependence between pre-school 
and pre-parental education. 

8. A Further Reyort on ibe Social Studies in the North Central 

This bulletin* supplements and brings down to date the 
earlier reports by L. V. Koos and 0.0, Davis on the teaching of history 
and citizenship in the secondary schools of the Middle West. (“The 
'Adrhinistration of Secondary Units." The University of Chicago 
press, 1917 and Training for Citizenship in the North Central Second¬ 
ary SchbplB. The School Bewew, Vol. 28, pp. 263“282.) This Illinois 
bulletin gives one a representative sampling of what social sciences are 
nOw being taught, of the time allotment, the scope and content of such 
courses, the textbooks used and some indications of the, methods of 
instruction that are followed. As the authors state, “No effort lias 
been made to interpret the facts;” but studied with the two earlier 
reports, one may learn a great deal concerning what the North Central 
Association High Schools have been teaching in the field of the social 
sciences for the past seven or eight years, Earle Rirao. 

*GeaeU, Arnold: "The Pre-sohool Child.” New York: Houghton Mifflin, 
1923, pp. yy + 264. 

•Monroe, W. S. and Foster, I. 0,: The Status of the Social Soiences in the 
High Schools of the North Central Aaaooiatuin. BuHelxn 13, Bureau of Eduoa- 
tionfll Researoh, University of Illinois, Urbana, Ill,, 1923. 
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Brief Notices of Other Publications Received 

(BecaufiO this number is the lost issue before the summer recess the brief 
mention given below was deemed preferable to postponed review.) 

1. A Study of the Rise of CivilizeUions. —This is first serious attempt 
to analyze and relate the processes of human culture in systematic 
fashion.^ Students of sociology and ethnology will find the compara¬ 
tive study of racial types facilitated by the illuminating organization 
and presentation of data. 

2. The Measurement of Emotion. —^An experimental study® of the 
types of verbal reactions made to words presented orally by the associa¬ 
tion method. The reaction time and the galvanometric reflexes are 
recorded. The author finds evidence in support of Jung's hypothesis 
that the association method is a useful device for uncovering emotional 
complexes. He also develops a theory concerning the facilitation 
and inhibiting effects of emotions on recall. 


3. The Accomplishment Ratio .—An account of the development and 
theoretical basis of the accompUskmerU ratio together with the results 
of its use in an elementary school. The author® believes that there is 
little speoiallzation among school subjects and that all are essentially 
perfectly correlated with general intelligence under favorable methods 
of instructions, 

A. I. G. 


4. A Study of Questions. —^This^ is a brief report of investigations 
into the frequency with which various types of questions are used in 
secondary schools, and the character of the question as a specific 
stimulus to mental activity. Twenty types of questions are considered 
and the common faults of procedure in answering each type is related 
to the shortcomings in the mental processes which lead to replies. 

‘ Wiaaler, Clark: “Man and Culture.” New York: Crowell, 1923, pp. 371. 

* Smith, Whately W.: “TheMoasureinont of Emotion.” New York: Haroourt, 
Brace and Co., 1022, pp. 184, 

*FranzGn, Raymond: “The Accomplishment Ratio." New York: Teachers 
Collie, 1922, pp. 69. 

* Monroe, Walter S. and Carter, Ralph E.: The Use of Different Types of 
Thought Questions in Secondary Schools and Their Relative Diffioulty for Stu¬ 
dents, Bulletin No. 34, Vol. XX, University of Illinois, Urbnnn, Illinois, 1923, 
pp. 26. 
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5. A Type in Job Analysis .—^Tliis book^ is based on researches ia 
the field of commercial printing. It outlines in detail the method of 
gathering data and developing a curriculum for the training of printing 
executives. It should have a much broader appeal than this would 
seem to indicate in view of the fact that numerous other problems oi 
personhel and curriculum could be solved by somewhat similar 
procedures. _^ 

8. Four New Drawing Scales .—^This bulletin® contains a brief 
summary of the data and technique used in the construction of four 
scales for representative drawing. The four scales cover four types of 
free-hand drawing. _ 

7. Educational ImpUcalions of Menial Hygiene .—Educators and 
laymen who cannot accept the Freudian analysis of mental disorders 
will find this volume* helpful in the diagnosis and treatment of certain 
common types of psychic disorder. 


8. Mental Efficiency and Tobacco .—This volume* presents data 
derived from observation, laboratory tests, introspection and biography, 
The work is the first to be published in the name of a committee 
organized in 1918 to study the tobacco problem. 


9. The Basis of Soctol Behavior .—After defining the scope of the 
book the author* discusses in succession the following significant 
phases of the subject, The seuSe of social unity, social motives, 
intellectual levels and psychic stability, racial factors, suggestibility, 
the crowd, convention, custom and morale, social progress and adjust- 
nient^ The work is built on the latest r^earohes in social science and 
the related sciences. It is written in a style which will appeal to 
student and general reader alike. 

^Strong, Edward 3C., Jr. and Ulirbrook, Richard S.; '‘Job Analysis and the 
Curriculum." Baltimore: Willinraa and Wilkins, 1923, pp. 148. 

*K]ine, Siuns W. and Caroy, Gertrude L,: The Eeviaed Kline-Carey Meas¬ 
uring Scale for Tkec-hoJid Drawing. Part I, Representation. No. 6n. The 
Johns Hopkins University Studies in Education. Baltimore: The Johns Hop¬ 
kins Press, 1023, pp. 10 -f- 4 scalra. 

*Bousfield, Paul: "The Omnipotent Self, A Study in Self-deception and 
Self-cure," New York; E. P. Dutton and Company, 1923, . VII -1-183. 

* O’Shea, M. V.: "Tobacco and Mental Effioienoy." New York: MacMillan, 
1923, pp, 268. 

Gault, Robert E.; "Social I^ohology.” New York: Holt, 1923, pp. 336. 
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A NEW METHOD FOR DETERMINING THE SIGNIFI¬ 
CANCE OF DIFFERENCES IN INTELLIGENCE 
AND ACHIEVEMENT SCORES 

TRUMAN L. ICELLEY 
Stanford University 

The satisfaction that attaches to such a judgment as "Joe is a 
bright boy, but his school work is not up to his ability,” or "John is 
more oapabi© in aoience than ho is in language study” is probably duo 
to the very great momentousness of the judgment. Father, mother, 
John himself, has a keen desire to know the likelihood of future suc¬ 
cess in various callings and the demand for light is so insistent that 
it is little wonder that not infrequently otherwise reputable teachers 
and school principals proffer advioe whon they are really incompetent 
to do so. 

Without referring to the aggravation of the problem caused by 
charlatans and human nature fakers, there has been a real difficulty 
in it on account of the lack of an experimental means of checking 
up, one’s judgments of the differences in the mental abilities of an 
individual. 

There has been a resort in recent years to an appraisal of scholastic 
success and promise by means of the Accomplishment Quotient. A 
child's pedagogical age, determined in a very fallible way (by class 
marks or scores in a school test) is divided by his mental age, likewise 
determined by fallible means (a group or individual intelligence test) 
and this quotient is taken as the ratio of what the child accomplishes 
to what he would have accomplished had he put forward just average 
effort. 

A reliable judgment of this sort would serve many purposes. 
It is therefore of importance that the interpretation of such a quotient 
be made only in the light of its probable error, Toops and Symonds 

321 



322 


The Journal of Educational Psychology 


have discussed the merits of the accomplishment quotient^ aud ' 
Chapman has shown® the unreliability of measures of difference 
between educational and intelligence test scores. Though I find my. 
self in the main in agreement with Chapman’s position I do not consider 
the difficulty of making judgments of difference as great as he pictures 
it, and for two reasons, one theoretical and the other practical. First, 
he judges of the excellence of one fallible difference by comparing it 
mbh a second fallible difference, whereas he would have obtained a 
truer idea of the significance of his fallible measure had he compared it 
with a true difference; second, the illustrations he gives are not well 
chosen as the functions or tests involved are not as disparate as are > 
other functions for which we can readily obtain fairly reliable scores. i 

The logical approach to this problem would be to ascertain 
experimental and statistical analysis disparate mental functions in 
mankind, or at least in childhood, and then devise tests to measure 
these functions separately. Such an approach would utilize the best 
of our fallible mental measures and endeavor to determine the degree 
of community or disparity between the traits involved ‘provided ihsy 
wm measured perfectly ^ and would seize upon any found to be disparate 
when so measured. Total disparity between two traits means, of 
course, that a given amount of the one trait presages nothing as to the 
the probable amount of the other. The finding of these traits would 
constitute the foundation for the important practical problems of 
differentiation of abilities. 

It is obvious that two intriDsically disparate traits, if measured in 
ai very unreliable manner, will not permit of reliable judgments of the 
sort “John is more capable in the first than in the second.” On the 
other hand if two so-called traits are intrinsically the same then 
inaccurate measures of them will proclaim differences which in fact do 
not exist. Thus errors in our measuring devices cloud real differences 
and bring forward spurious ones. The subject matter of this article 
is the validity of differentiation of abilities upon the basis of scores 
obtained in ordinary achievement tests and not the more funda¬ 
mental and logically antecedent one of discovery of disparate traits. 

We will not deal with quotients of measures, but with differences. 
Let there be two tests, say arithmetic and spelling, resulting in scores - 
Xi and .^2 with means for the group studied Mi and and standard 
deviations tri and 0-2. We cannot immediately interpret a difference 
(X-i - Xi) in a child’s arithmetic and spelling test scores because the 

* This journal, Deo., 1922 and Jan., 1923. 

* This journal, Peb.,' 1923, 
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units of measurement are -very different. Thus if a fifth grade pupil, 
A, secures scores of 26 and 80 upon the two tests we cannot forthwith 
judge of his relative standing in arithmetic and spelling. We must 
first express those scores as deviations from the grade means divided 
by the standard deviat^ns. Such derived scores we will call standard 
scores, because their st^^gd deviations are equal to 1.0, and repre¬ 
sent by the symbols Zi and%. Thus 




(Tl (T2 


( 1 ) 


IfJVfi = ISjMa = 70;(ri = 5;(r2 = 20, we have for the example given, 
2 i = (25 — 15)/6 ^ 2.0, and Zz = (80 — 70)/20 = .5. Accordingly 

being 2.0 standard deviations above the grade mean in arithmetic 
and .5 standard deviations above in spelling, is 1.5 standard deviations 
better in arithmetic than in spelling. Let us call such a difference d. 
Thus 

d = zi — Zi ( 2 ) 

Is this difference significant or is it a chance difference, due to the facto 
that we have inaccurate measures of both arithmetic and spelling) 
abilities? 

Let us assume a second individual, B, obtaining a d = — 1.0, i.e. 
B is one standard deviation better in spelling than in arithmetic. If 
we have N pupils there will be N values of d and the standard deviation 
of these d's is, by the usual formula for the standard deviation of a 
difference, 

o'd = ■%/ — 2ri2crsiffg2 = ‘\/2 — 2ri2 (3) 

in which ri 2 is the correlation between the arithmetic and spelling 
scores. 

' It is important to note that this is the standard deviation of the 
distribution of d's, but it is not at all the standard error of a single d. 
This may be made clear by considering the difference between height 
and weight, two measures which we can obtain with very high relia¬ 
bility. Let us say that for two individuals, E and E we obtain two 
height-weight d's, 1.6 and —1.0 respectively. Were we to measure 
E in both height and weight a number of times we might, due to 
inaccuracies in weighing and measuring height, obtain a series of d’s, 
1.602, 1.408, 1.501, etc. SMarly for B, -1.005, -1.004, -1.000, 
etc. These two series of d'^'are not samplings of the same thing. 
Individual A is tall for his weight and individual B is short for his and 
that is the truth of the situation. The differences found are sub- 
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stanfcially accurate so that the standard error of these d's should be a 
very small amount, let us say .002, and not at all of the order of 
magnitude, roughly .9, given by formula (3). 

- "What we wish to know is the deviation of the d’s, obtained for 
individual A by successive testings of him, from his true d, Let 
us define the true difference for individual A as equal to his true 2 i 
score minus his true Sj score, and let us define his true z\ score as his 
average score in arithmetic if tested by means of similar tests and under 
similar conditions an. infinite number of times. We will define his 
true iSf 2 score in the same way except that we are here dealing with 
spelling and will designate these true scores by z„ {z sub infinity) and 
z„ (z sub omega). 

A single d (= 2 : 1 — 22 ) obtained for individual A is similar to a 
second one obtained for him by means of additional Xi and Xz meas¬ 
ures because individual A is the same throughout, i.e., 2 , and 2 „ have 
hot changed. Thus the standard error which w© wish is not of 
formula (3), but that is, the standard deviation in the d’s for 
constant values of 2* and The magnitude is of the type 
^ familiar to students of multiple correlation. Those who are not 
will need to skip the next paragraph giving the evaluation of this 
standard error. In deriving it, certain basic formulas are required. 
The original derivations of these have appeared in various places. I 
hiye elsewhere* given tho derivation of each of them, and as these 
derivations arc lengthy they wiU'not be repeated here. The needed 
lormiilas are; 

■: .(**): = \/r\i The cori^lation between a fallible spore and 

a true score of tho same function is equalto the 
square root of the reliability coefficient. 




The correlation between a fallible score and 
a true score of a different function is equal to 
the correlation between the fallible scores in 
the two functions divided by the square root 
of the reliability coefficient of the second 
fallible scow. 


(c) O’*_v — Vo’i* -f- o-j® — 2ri2iri<r2 

(d) ffi.aa = a iV 1 — 7*18* 'S/I—f'la.a* 


\b) Tn.i = 


Saii.airs.a 


ICelley: “Statistical Method.’' 


MumiUaa, 1023. 
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(/) is S’ iGsidual, that is, it is that part of ai which is independent 

of la ■ . . 

X~M 

(g) In ' case 2 = — - —»then o-« = 1.0 ^ ' 

"With these transformations available we may derive o-^ 

„ _ S(2i,«bm ^8.0^)^ _ 1 _L. 2 O 

By (c) (4) 

tr,.®.* - - n®^)Cl - r,^J) = 1(1 -,;;)(! - 0) = 1 - r,/ 

By (ff), (a), (e), (/) (6) 

The correlation is readily shown to be equal to aero: 




^ !«•« 


1. CD 20,0 




( 6 ) 


Now 2 i.« is the residual in »i for a constant z», i.e, 2 i.„ is that part 
of gi which is unrelated to Since however z» is the true score 
in the function and «i.« is that part of the obtained score which is 
unrelated to the true score it is simply the chance element in the Zi 
score. Being thus merely a chance factor it will have a true corre¬ 
lation with no other measure whatsoever and therefore not with 
4.,, 80 we obtain ri„.« = 0. This same result may be obtained at 
the expense of mote labox and less logic by expanding by the 
usual multiple correlation formula for three variables and making the 
neoessary substitutions. The substitutions required are (a), (6), (d), 
(«), (f), (g). A similar derivation gives 

= 1 — rsri (7) 

finally, ria.«« is the correlation between two chance factors—the 
tefliduale in Zi independent of true values of z„ and and the residuals 
in 22 independent of them—and it therefore equals zero. This result 
may likewise be obtained by using the usual multiple correlation 
formula for four variables, expanding, substituting and simplifying. 

We thus obtain the important and very simple formula 

-V2-r,r~~r,rt ( 8 ) 

This formula fills a long felt need since it makes possible the deter¬ 
mination of the probable errors of our judgments of difference of 
abilities within the individual. 


PE (of individual Zi — ^s) - .6745V2 - - ’’ 2 ^ 


(9) 
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Tho aooompanying table of means, standard deviations, and relia¬ 
bility coefficients for the separate tests in the Stanford Achievement 
battery -were obtained from four Palo Alto eighth grade classes. 
The population is 96. This table provides ua with the constants 
needed in obtaining standard 8001*68 (2 scores) and the probable errors 
of the differences in standard scores 


Table I 


Test 

Mean 

Standard 

deviation 

Heliability 

coeffleient 

Arithmetio aomputcition. 

36.0 

8.94 

.609 

Aritbmetio r«Daoning. 

27.8 

6.21 

.825 

Arithmetic total...1 

03.8 

7.20 

.800 

Word meanirg. 

60.2 1 

10.0 

.044 

SontencB meaning. 

04.2 1 

10.8 

.806 

Paragraph meaning. 

46.4 

6.26 1 

.848 

Weighted reading total. 

222.2 

30.4 

.943 

Language usage. 

38.1 

8.00 

.729 

Spoiling. 

170.0 

10.2 

.910 

Science information. 

78.3 

10.8 

,772 

History and literature information... 

63.8 

18.1 

.914 

Weighted grand total. 

825.8 

88.1 

.954 


The test records of four pupils are recorded in the following table, 
Pupils A and B are of opposite type and show extreme inequality in 
achievement record; pupil C is typical of the bright symmetrically 
developed child, and pupil D has the lowest total score, shows marked 
inequality in ability and being nearly seventeen years old is likely to 
be in need of educational and vocational counsel. 
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Tabub II 


Test 


8eoies of pupils 


A 

B 

0 

D 

AiUhmetio oomputation. 

30 

36 

40 

39 

Arikhmetio reasoning. 

24 

32 

33 

11 

Arifchmetio total. 

64 

68 

73 

60 

Word meaning. 

78 

62 

74 

36 

Sentence meaning. 

08 

66 

67 

26 

Paragraph meaning. 

61 

40 

61 

23 

Weighted rending total. 

248 

188 

243 

107 

Language usage. 

40 

26 

48 

20 

Spelling. 

144 

158 

204 

170 

Science information. 

76 

73 

81 

59 

History and literature infor¬ 
mation. 

76 

• 72 

80 

9 

Weighted grand total. 

806 

789 

948 

665 


Expressing the scores of the same pupils as standard scores we obtain 
Table III. 

tablb hi 


Teat 

Standard soorcs of pupils 

A 

1 

B 

C 

D 

Arithmetic computation. 

-1.5 

.0 

1.0 

.8 

Arithmetio reasoning. 

- .7 

.8 

1.0 

-3,2 

Arithmetic total. 

-1.4 

.0 

1.3 

-1.9 

Word meaning. 

1.1 

-1.3 

.8 

-2.8 

Sentenoe meaning. 

.4 

~ .8 

.3 

-3,7 

Paragraph meaning. 


- .0 

.9 

-3.6 

Weighted reading total...... 

.8 

IHU 


-3.8 

Language usage. 

.9 

iBsMi 

Hl| 

-2.0 

Spelling. 

-1,4 

iBH 

mSM 

.0 

Science information. 

.2 

.0 

.7 

-1,3 

History and literature information.... 

,7 

.6 

.9 

-3.0 

Weighted grand total. 

- .2 

- .4 

1.4 

-3.0 
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Pupil D is markedly weak in arithmetic reasoning, reading, and 
history and literature informationj slightly better in science infor¬ 
mation and much better in spelling and arithmetic computation. With 
such other items as are readily available (sex, disciplinary record, 
expressed aspirations, etc.) the kind of advice to give this pupil would 
be obvious, provided we can trust these differences. Let us therefore 
calculate, by formula (9), the important probable errors: 

Difference:, Arithmetic computation Arithmetic reasoning = 
4.0 standard deviations. PE of this difference = .6. 

Difference::Spelling — Reading total = 3.8 standard deviations. 
PE of this difference ,= .3. 

Difference; Scienoe-information — History and literature infor¬ 
mation = 1.7 standard,deyiationa. PE of this difference = .4. 

These probable errors apOx^o small with reference to the differences 
involved that the existenee'of -^y,important differences may be tahen 
as definitely established. This child,'as the result of original nature or 
training, it probably matters not which, .approaches the end of school 
and the seventeenth birthday with a ddfi'm^Oi interest and mental bias. 
Is it not to be expected that the child^s Weal or woe lies in the 

use or disuse of this bias in vocational and reoreatiphal life? 

Few cases show as pronounced idiosyncrady-as this; pupil D, but 
two-thirds of the pupils, as will be shown, reveal differences in excess of 
thephe^nce,;differences due to the fallible nature of the measures used. 

Pupil C,,.age 12J^, is probably a great comfort to parents and 
. teachers; always doing well and the <»Ai8e of no perplexity unless to 
;the iobunselor who is asked to advise a specific vocation or course of 
: study. The only difference which is quite certainly significant is that 
between Sentence Meaning and Spelling (difference *= 1.4; PE of 
difference .4). Advice to i>ay more attention to the meaning of 
sentences and less to the spelling of the words in them is suggested, 
but it is, quite unnecessary as the child will certainly do this any way 
in the higher school grades. A little extra spelling ability at this age 
and grade is no serious cause for worry. 

, Pupils A and B are two types which are found sufficiently often 
among these 96 pupils to raise the question whether there are not in 
truth "multiple types” such as Thorndike,^utilizing such experimental 
'6|ita as the tests of 10 years ago made possible, found no evidence to 
support, Thorndike has long held that there is unevenness in develop- 
Wynt, but it now seems quite possible that the unevenness will be 
found not to be random but to fall into types. For pupil A we have 

, . * "Educational Psychology," Vol. HI, Chap. XVI. 
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Difference: Arithmetic total — Reading total = —2.2 standard 
deviations. PE of this difference = .3. 

Difference: Reading total — Spelling = 2.2 Standard deviations. 
PE of this difference = .3. 

Here again the differences -are indubitably significant. For pupil 
B we have: 

Difference: Arithmetic total — Reading total = 1.7 standard devi¬ 
ations. PE of this difference = .3. 

Difference: Arithmetic total — Spelling = 1.3 standard deviations, 
PE of this difference = A. 

The search for the meaning in terms of human nature and educa¬ 
tional psychology of such differences is very enticing, but let us first 
endeavor to ascertain the types of differences which are most apparent 
by the aid of the measuring devices represented by the Stanford 
Achievement tests. The frequency with which differences of various 
sorts are revealed by these fallible teste will not exactly parallel the 
importance of the differences in the natures of the pupils studied, 
because the tests are not equally reliable. Thus, if given four traits, 
a, 6, 0 , d, such that children intrinsically vary as much in the difference 
a - b as in that cd and given further measures of a and h which 
are more reliable than those of c and d, then we will be able to discover 
and determine differences a — b more often than differences c — d. 
We must therefore keep in. mind that the ease with which differences 
are discovered by the aid of the Stanford Achievement Tests depends 
upon: First, the extent to which the individuals differ within themselves 
and second, upon the reliability of the tests. With this in mind let us 
seek to determine the proportion of cases in which the difference 

- Zi {e.g, Computation-Arithmetic reasoning) is so great as to be 
significant. 

The standard deviation of such a difference for a given pupil Js 

~ Vi-r,I —Till 

If the distribution of differences for the entire population should have 
the same standard deviation os this, then, obviously, the obtained 
differences are no greater than chance indicates, However, the 
standard deviation for the group, of'obtained differences is 

<rd — Vi — 2ri2 

and this standard deviation is greater than the former for every com¬ 
bination by two's of the tests in the Stanford battery. The type 
situation is pictured in the accompanying figure. The dotted curve 
is a distribution of standard deviation = and the full line 
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curve is a distribution of the same total area of standard deviation = 
o-j, Should the full line curve coincide with the dotted line curve 
then the differences found would, on the whole, be no greater than 
chance suggests, but if the full line curve has the greater standard 
deviation then, in the proportion of CMes represented by the shaded 
area the obtained differences are greater than chance suggests. The 



figure drawn represents substantially that for Computation and 
The ahaded exea ia 26 per cent of tha total 
that approximately one-quarter of the pupils showed differences in 
computation and arithmetic reasoning abilities, when measured by 
the two Stanford Achievements tests, greater than chance. The 
proportion represented by the shaded area depends upon the ratio of 
the standard deviations, and To obtain this proportion 
knowing the ratio. Table IV is given. 


Tabm> IV 


CctiOcAi 

Od 

Proportion ofdKTor* 
enooB In oxooea of 
Lh« oUanoo propot* 
tlon 

<rd 

Proportion of UJffsr- 
eooes in oxtsesr 

Ibe ohanoo 

tion 


Proportion of dllfot* 
onoos in excers of 
the ohuuoe ptopor* 
tion 

.02 

.960 

.3fi 

.467 


.171 

.06 

.888 

.40 

.416 

.75 

.138 ■ 

.10 

.798 

.46 

.367 


.108 

S'. 16 

.719 • 


.323 

.85 

.078 

.20 

.Q47 



.2Q 

.Q5L 

.26 

.682 

.60 

.242 

.95 

.025 

.30 

.622 

.66 

.205 

.99 

.006 


For these four sections of eighth grade pupils the correlations 
between the various tests in the Stanford Achievement battery are as 
given in Table V. These intercorrelatloiis permit the calculation of 
(fd, taking the tests in pairs. 

Having the intercorrelstions of Table V and the reliability coeffi¬ 
cients of Table I we can csdculate the ratio for every pair of 

j tests and obtain by Table IV the percentage of pupils showing differ- 
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Table V.—iN'rajAconRELATiONB between the Tests in the Stanford Achieve- 
/ MEET BaTTBBT 



Computation 

1-^ 

la 

r- 

Arithmetic 

total 

1 

•o 

1 

Sentence 

meaning 

Il 

1 

4* 

9 

1 

Sdence infor¬ 
mation 

J ^ ^ 

Arithmefcifl Tonaoning. 

.224: 










Werd meaning.. 


Bii 









Sentence meaning. 




.738 







Paragraph meaning. 


.484 

.... 

.743 

.646 






Reading total. 



mm 








Language usage. 


.260 

.2011 

■iWil 

.600 

,810 

;I|TB 




Spelling. 

.329 

.167 

mi 

.472j 

.284 

.616 

. 477 : 

IM 



Soiance inforraotion. 

.178 

.668 


.632 

.464 


MM 

.322 

.342 


Hlfitory and literature iiifor- 







11 




mation. 

.120 

.607 

.432 

.674 

.620 


■ 

,439 

.385 

,764 


encea in ability sufficiently great that they cannot be attributed to 
chance. Table VI gives these results in the case of the ninety-six 
eighth grade pupils. 

Table VI.— PsRCBNTAaB of Differences in Individual Test Scores in 
Excess of tub Giunod Percbntaqb 


f ' ' _ 

a 

0 

1 

a 

0 »A 

li 

••S 

i’s 

0 { 

a 

•s 

0 

& 

8.| 

Ja .. 

S3 

Pi “ 

'P 

8 0 

§ 

S 9 

1 

1 ■ 
sS 

Arithmetic reasoning. 

26 










Word meaning. 

37 

38 









Sentence meaning. 

33 

33 


18 







Paragraph meaning. 

28 

27 

., 

22 

17 









30 








Language usage. 

28 

28 

29 

20 

18 

14 

20 




Spelling. 

27 

41 

37 

44 

37 

32 

44 

20 



Science information. 

20 

18 

20 

28 

22 

20 

26 

24 

33 


History and literature infor- 











mation. 

33 

31 

33 



27 

31 

27 

33 

10 


Very interesting evidence of lack of lelationship between certain 
scholastic achievements is found in this table. The most striking 
feature of the table is that there are no two functions which do not 
show substantial disparity. The two which can be differentiated 
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least readily are Science Information and History and Literature 
Information. Of all the tests these are the least definitely connected 
with the subjects in the elementary school curriculum. 

It ia very interesting to note that the knowledge of the meaning of 
words and of the spelling of words do not develop in a parallel manner, 
for no less than 44 per cent of these eighth grad© children show measur¬ 
able differences in relative ability in these two subjects. On the 
other hand Language Usage and Paragraph Meaning are more closely 
allied as but 14 per cent show significant ineauality in development in 
these two subjects. These observations apply merely to the differences 
in ability as revealed by the tests and not to intrinsic differences 
within the children. We may, however, infer from the data at hand 
what the percentages of differences in excess of chance would bo 
wei’f} all the tests equally reliable. 

:;i,.The mean reliability coefficient, weighting all the tests equally 
and omitting Arithmetic Total, Beading Total, and Grand Total 
reliabilities, ia .824. The correlation between Computation and 
Arithmetic Eeasoning is .224, but if the two testa were perfectly reliable 
this correlation would be 

= .302 

V^i/ vrjM 

in which is to be read “the estimated correlation between true 
scores in the first and second traits." It is the correlation corrected 
;for attenuation by Spearman’s formula. Reversing the calculation, if 
^0 two tests, instead of being perfectly reliable, each have reliabilities 
,824/ then y 12 would equal 

r VS = (.302) (.824) = .248 

Under these conditions, i.e. having Computation and Arithmetic 
Reasoning tests of reliability .824, we would have 

= V'2'^24^.824 / - 2(.248) = .484 

Entering Table IV with this value we obtain .337, the proportion of 
differences in excess of chanoe between Computation and Arithmetic 
Reasoning provided both tests have reliabiUty .824. Making similar 
■ calculations for the other pairs of tests yields the values of Table VII. 

Whereas we can, with the exercise of some care, obtain a spelling 
of reliability .91 it is with great difficulty that we can obtain a 
edmputatiou or language usage test of reliability .76. Accordingly a 
; sithdtion in which all the terts have equal and high reliability is not 
likely to be obtninod bVsotiVely, but if attained we should expect 
iibdul i’i'Ji-u«fd in Table VII. 
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TABtB VII.— Percentage op Dipfbrbnobb in Individual Test Scores in Ex¬ 
cess OP THE Chance Pbrobntaqb in Cash the Reliability op Each Test is .824 



Arithmefcio reasoning. 

34 








Word meaning. 

30 

30 







Scatence meaning. 

40 

26 

13 






Paragraph meaning. 

36 

26 

14 

17 





Language usage. 

30 

33 

24 

23 

17 




Spelling. 

31 

36 

2S 

33 

26 

30 



Science information. 

35 

20 

24 

26 

21 

30 

31 


History and literature information. 

37 

26 

20 

20 

20 

27 

31 

9 


Fi’om this table we learn that Computation is more similar to 
Spelling (31 per cent of differences in excess of chances) than to any 
other of the testa, not excepting Arithmetic Reasoning, and that it 
is most different from Sentence Meaning (40 per cent of differences in 
excess of chance). Arithmetic Reasoning is most closely allied to 
Science Information and least to Spelling. Word Meaning, Sentence 
Meaning and Paragraph Meaning are mutually related. Each is also 
quite similar to History and Literature information and quite dissimilar 
to Computation. Language Usage is most similar to Paragraph 
Meaning and least to Arithmetic Reasoning. Science Information 
shows the strongest bond with History and Litemturo Information, 
and the weakest with Computation. History and Literature Infor¬ 


mation, in addition to its similarity to Science Information, is related 
with Reading and inversely with Computation. 

The probable errors of these percentages of differences in excess of 
chance are not known but are probably not large as the results point 
uniformly to certain types of relationship and do not wabble around 
as would chance results. Repeated and further investigations are of 
course necessary, but the indicationsare that eighth grade children are in 
fact ‘'real persons" with specific and unique mental mechanisms. The 
.charge, sometimes made, that standardized educational processes have 
killed all individuality is not borne out by the findings. When using 
the Stanford Achievement T^ts two-thirds of the children reveal one 


or more significant inequalities in levels of attainment in different 
subjects and there are undoubtedly still other important moutal 
differences not as yet revealed. 


















THE ACHIEVEMENT QUOTIENT TECHNIQUE 
G. M. HUGH 
University of fowa 

Introduction .—The very welcome critical article by Toops and 
Symonds^ which recently appeared in this journal has gone a consider¬ 
able way in the oleailng up of the fundamental issues in the AQ 
technique, although, aa the authors point out, it has raised many more 
problems than it has attempted to settle. The present writer is partic¬ 
ularly grateful to these authors since it enables him to plunge more 
directly into the discussion of certain other issues. 

There are today apparently three batteries of tests, in addition to 
the ones adapted by Eranzeu for his purposes, which cover two or 
more of the most important elementary school subjects. These are 
the Illinois Examination^ by Monroe and Buckingham, the Lippincott- 
Chapman Classroom Products Survey Tests^ by Chapman, and the 
Stanford Achievement Test^ by Kelley, Kuch and Terman. The 
first includes scales or the measurement of reading, arithmetic and 
general intelligence. The second mef^ures only arithmetic and read¬ 
ing. The last includes reading, arithmetic, spelling, language and 
grammar, geography and elementary school science, and history 
and literature (six separate subjects and nine tests in all). 

The fullest dtaoussion of the significance of the AQ as an educor 
tional instrument is given by Franzen in his monograph entitled “The 
Acoornplishment Ratio.”® Pintner and Marshall" have also used an 
analogous technique in the attempt to measure motivation in terms of 


* Toops, H. A. and Symonds, P. M.; What Shall Wo Expect of the AQ? 
Journal of Educational Psychology, Vol. XIII, 1922, pp. 513-628 and Vol. XIV, 
1923, pp. 27-38. 

* Monroe, W. S. and Buokingham, B. R,: “The Illinois Examination, Teachers 
Handbook.’' Bureau of Research, University of Illinois, 1920. 

* Chapman, 'J. C.: “The Lippinoott-Chapman Classroom Products Survey 
Tesla." J. B. LippincoU Co., FMlodclphia, 1920-21. 

‘Kelly, T. L,, Ruoh, G. M. and Terman, L. M.: “The Stanford Achievement 
Test." World Book Co., Yonkers, N, Y., 1923; especially the “Manual of 
Directions," pp. 62-68. 

‘Franzen, R.: “The Aooomplishmont Ratio." Teachers College Contri¬ 
butions to Education, No. 126,1922. 

•Pintner, R. and Marshall, H.: A Combined Mental-educational Survey, 
Journo! of Edncational Psychology, Vol. XII, 1921, pp. 32-43. 
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the differences between their "Educational Index" and the "Mental 
Index.” 

Nature of the Present Study .—The present study reports compari¬ 
sons between the educational ages (EA^s) and achievement quotients 
(AA’s) obtained by the use of the Illinois Examination, the Lippin- 
cott-Chapraan Classroom Products Tests, and the Standard AcMeve- 
ment Tests. All three of these examinations were given to a group of 
about seventy-five VI, VII, and VIII grade pupils in the University 
Elementary School at Iowa City, Iowa. Due to absences, the total 
number of complete papers was reduced to 64, this number being 
constant in all calculations reported here. Because, however, of 
the close classification in effect in this school, the range of talent in 
the three combined grades is less than that of a single unselected age 
group or pos.sibly even the average public school grade. 

In order to make all comparisons as strictly comparable as possible, 
several modifications of the customary uses of those tests have been 
made. These can be described as foUow.s: 

1, In the Illinois Examination, the intelligonoa scores have not been used at 

all. In all of the test batteries, for purposes of caloulating AQ's, Stanford 
Binet Mental Ages have been used. Educational age in the Illinois 
Examination means throughout this study the average of the educational 
ages for the two eubjeots, arithmetic and reading. (Reading age in turn 
is the average of rate and comprehension age soores.) 

2, In the Lippinaott-Chapmau Test the same procedure for educational ages 

was followed as in the case of the Illinois Examination, i.e. reading and 
arithmetic ages were averaged. 

3, In using the Stanford Achievement Test, two procedures have been followed: 

(а) The average of orithmetio and reading ages have been determined for 

purposes of comparison with the two preceding teats, and, 

(б) The educational ages for the composite scores of the entire battery 

(six eubjeots or nine separate tests) have been oaiouiated accorchng to 
the regular procedure described in the manual. 


Educational ages (1), (2), and (3a) are therefore quite comparable 
since all are based upon reading and withmetic alone. Educational 
ages (3b) are useful as a check and criterion since they are baaed upon a 
130-minute testing of high reliability (about 0.98 for unseleoted age 
groups. See "Manual of Directions,*’ pp. 15-16). The Binet Mental 
ages as well as the chronological ages are also serviceable as statements 
of the range of talent in this experimental group of 64 subjects. The 
notations in connection with the Standford Achievement Test scores 
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of (a) and (b) are defined in. the above statements and this usage is 
consistent throughout this paper, The AQ is defined (except in 

Table V} by the formula AQ = where AQ = the achievement 

or accomplishment quotient, EA = educational age, and MA * 

BA 

Binet mental age. In the AQ’s of Table V, AQ = where CA 
= chronological age. 

Table I gives the essential facts about the range of talent employed. 


Tablb 1 



Mean years 
and months 

Sigma 

months 

1. Chronological age. 

12-10.8 

' 17.0 

2. Mental age (Binet). 

14- 4.1 

24.6 

Educational ago: 

8. Illinois Examination.... 

14- 5.4 

28.3 

4. Lippincott-Chapman...... 

13- 6.7 

19.3 

6. Stanford Achievement Test (o)... 

14- 4.0 1 

17.1 

6. Stanford Aohisvement Test (6)... i 

14-0.3 

17,4 


Table II states the correlations^ found for these separate variables 
of Table I. 


Table: II 



1 

2 

1 3 

1 

4 

6 

6 

1. Chronological age. 


0.002 

H 

IPII 

0.290 

0.163 



m 

1 

0 782 

0 780 

3. Illinois Examination..... 



0.772 

0.814 

4, Lippincott-Chapman. 

6. Stanford Achievement Tost (o). 
6. Stanford Achievement Test (fe). 
N « 64 

... 


. 

0.869 

0.826 

0.960 


^ The probable errors aro not stated. The number of cases is constant atj64 
throughout. 
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Table III gives the oorrelations between the AQ’e and Binet IQ's. 

Taols III 


V 

Dlinois Examination. _ 0.308 

Lippinoott-Chapman. —0.664 

Stanford Aohiovoment Teat (a)... —0.002 

Stanford Achievement Test (6).•_ —0.763 


Table IV gives the medians, means, and standard deviations for 
the AQ's. 


Table IV 



Median 

Mean 

Sigma 

Illinois Examination. 

09.7 


12.1 

Lippinoott-Chapman. 



0.2 

Stanford Aohievoment Test (a). 

00.3 


g.o 

Stanford Achiovcmont Tost (6).> 

08.1 

08.6 

9.2 


Since there are some authors who have recommended the use of the 
chronological age as the denominator in the formula for the calculation 
of the AQ, it is interesting to compare the same measures given in 
Table IV with those obtained by the CA method. This alternative 
method has at least the one advantage of not being committed to any 
hypotheses about the correlation of educational ability with general 
intelligence. Reference to this point will be made later in the 
discussion. 

Table V presents the AQ’s figured from the chronological age base. 


Table V 



Median 

Mean 

Sigma 

Dlinois Examination. 

110-7 

112.7 

20.0 



106.2 

16.1 

Stanford Achievomont TRsf, («). 

112.6 

111,6 

13.9 

Stanford Achievement Test (d). 

110.4 

110.4 

14,7 

Binet iQ'a. 

110.6 

112.7 

18,7 
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Discussion of the EesvUs.—By leference to Table I, it will be seen 
that the agreement among the three test batteries is far from perfect 
for the means and standard deviations. It is difficult here to set up a 
valid criterion of the probable truth. One way of providing such a 
criterion is that of accepting the average of the throe batteries (Illinois, 
Chapman, and Stanford (6). This gives us approximately 140-0 as 
the probable mean educational or achievement age for the group. 
By this criterion the Stanford Achievement Test appears to be the most 
accurately scaled, the Illinois ExMuination running about as much too 
high as the Lippincott-Chapman test runs too low. The mean mental 
age might also have some value as acriterionin this connection although 
it cannot be stated with any great degree of certainty whether the mean 
educational age will be found to equal, exceed, or fall short of the moan 
mental age of a group of pupils. The safest assumption is that, under 
ordinary school conditions, the educational ages will fall short of the 
mental ages in the majority of cases in all probability. Whether this 
a priori statement holds or not, the three test batteries will be found to 
occupy the same rank positions on a scale of difficulties as they did by 
the first criterion. At any rate, we might wish a closer agreement in 
the means than that reported since the means are not very greatly 
affected by the phenomena of unreliability and regression which exert 
marked influences on the standard deviations. 

When the standard deviations are considered, there are greater 
differences in evidence. Two influences may be at work here; m., 
(1) the age norms may be distorted at the extreme ranges due to arbi¬ 
trary scaling outside the limits of direct experimental determination, 
and (2) the standard deviations are spuriously large to varying degrees 
in the three batteries due to unreliability. In connection with the first 
point mentioned, it should be noted that the Illinois Examination 
norms as published in the manual of instructions provide for achieve¬ 
ment or educational ages covering a range of from 6-0 to 25-6 years, 
i.e., they are arbitrarily extended about 10 years beyond the upper 
limits of direct experimental determination. This is not necessarily 
a criticism since practically all educational and mental tests have been 
thus extended, but the fact should be emphasized that such extensions 
do magnify enormously the dangers of error from distortion of the test 
scaling in the extreme ranges. It is moreover difficult to interpret 
the meaning of an educational age of 25 years. The Lippincott- 
Chapman scale covers a range from 10-0 to 17-0 years and the Stanford 
Achievement Test permits a range of educational ages of 7-6 to 18-6 
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years, the ages above 14-6 being arbitrary extensions. The actual 
ranges of educational ages by the several methods 'will throv? some light 
on this situation. 



Range 

Difference 


10- 6 to 20-10 

11- 2 to 17-10 
11-7 to 17-7 

11-0 to 17-4 
10-0 to 18-8 



6-8 





8-8 



The agreement is substantial for the Lippincott-Chapman test 
and the Stanford Achievement Test. The range for the Illinois 
Examination is greater than the logic of the situation would seem to 
suggest. Using mental age range as a criterion, the same conclusion is 
indicated again upon the assumption that eductional ages are likely 
to be found to be less variable than mental ages. The standard devia¬ 
tions of Table I are in accord likewise. 

Turning to the more important factor of the effects of unreli¬ 
ability, the standard deviations of Table I present even more striking 
differences among themselves. The variability of the Illinois Exami¬ 
nation ages is very much greater than is the case with the other two 
batteries, which agree rather closely. The variability of the Illinois 
Examination is much greater than for Binet mental ages. When we 
consider the working time limits of these several tests (using working 
time as a rough index of reliability), the sigmas obtained are quite in 
harmony with the expectations. These time limits are: 


Illinois Examination, reading and arithmetio... 21}i min. 

LippiiLCott-Chapman. 78 min. 

Stanford Aoliievement Teat, reading and arithmetic.. 80 min. 

Stanford Achievement Teat, composite of all subjects about. 130 min. 


That the variabilities are markedly affected by unreliability is shown 
by the formula cr true = a obtained V^j where 7’i2 is the reliability 
of one form of the given test. The standard deviations of nil of the 
tests, it follows, are spuriously large, this factor apparently being much 
more serious in the case of the Illinois Examination. The reliabilities 
for the range of talent involved in, the present study are not known. 
The manual for the Illinois Examination states the reliability coef- 
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ficients of the separate tests as 0.76 for arithmetic, 0.72 for compre¬ 
hension, and 0.79 for rate, for ranges of talent equal to grades VI, VII, 
and Vni pooled (p. 31). The standard deviations are unfortunately 
not stated but are undoubtedly larger than those found for the gi'oup 
under study here. Doctor Chapman, in a personal communieation, 
estimated that the reliability of his test would be about 0.80 to 0,90 
for our group. The reliability coefHcients of the Stanford Achieve¬ 
ment Test for unselected age gi'oups are about 0.98 for the composite 
educational ages ("Manual/* pp. 16-16). The range of talent 
involved iu such unselected age groups is from 22 to 24 months of 
Stanford Achievement Test Educational ages. Eor the combined 
reading and arithmetic ages, the reliability is probably not far from 
0.96 to 0.97 for the same range of talent. It will be recalled that the 
range of talent in our experimental group was about .17 months in 
terms of the educational ages of the Stanford Test. 

To summarize the foregoing discussion, although it is impossible 
to reduce these differences in variability to a precise numerical state¬ 
ment, it seems logical to suppose that the differences actually found 
are chiefly matters of unreliability arising from tho fact that a very 
brief teat (the Illinois Examination) is being compared with two very 
much longer tests (the Lippincott-Chapman and the Standford Achieve¬ 
ment Test). Ignoring possible differences in the accuracy of the scal¬ 
ings of those tests, it seems that the main source of differences is to 
he found in the varying degrees of fallibility in the tests compared. 
The question can fairly be raised, in this connection, however, whether a 
twenty minute test can be made sufficiently reliable for the purposes 
. of the AQ technique. This point constitutes the defense of the longer 
examination like the Stanford Achievement Test. A further point to 
the same effect is the question whether much meaning can be attached 
to AQ’s ranging greatly beyond the IQ's yielded by the best intelli¬ 
gence test®, e.g., the Binet Scale. The Illinois Examination provides for 
AQ'a between the limits of 34 and 264. The range found in this study 
was from 76 to 130. At the present time, Binet IQ’s probably represent 
our best estimate of the range of mental abilities in school children. 

The facts of Table II call for little comment. The correlations of 
achievement ages with Binet mental ages are moderately high but far 
from unity even with due allowance for unreliability. From the 
known behavior of the Binet Tests with respect to unreliability, tho 
opixelation between Binet mental ages and the educational ages yielded 
by the Stanford Achievement Test could rise to 0.95 or higher instead 
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of less than 0.80 as found. The coefficients actually reported for the 
various examinations are again in harmony with our conclusions with 
respect to comparative reliabilities. 

Table III shows that negative correlation between IQ and AQ is 
the rule as has been found by Fi'anzen and others. The differences 
among the three examinations again fall in line with our former 
interpretations on the basis of unreliability. 

Table IV shows again the same tendencies reported in Table 1; 
m., that the Lippincott-Chapmaii AQ’s tend to be systematically 
lower than those of the other two examinations. The standard devia¬ 
tion of the Illinois Examination shows the same relatively larger varia¬ 
bility previously mentioned. As a criterion of probable truth; the 
safest assumption hero is that the mean and median AQshould approxi¬ 
mate to 100. Except for the Lippincott-Chapman Test, the depar¬ 
tures from this estimate of the truth are not great. 

Table V (AQ’s upon a chronological age basis) shows the same 
diHerencQS in variability supposedly due to unreliability in varying 
degrees. The central tendency of such AQ^s is for these to approxi¬ 
mate the mean Binet IQ except in the case of the Lippincott-Chapman 
test. These facts are in harmony with the foregoing tables. 

GENERAL CONSIDERATIONS 

The reader is again referred to the discussion of the AQ technique by 
ToopsandSymondafor astatementof the issues and assuinptionsuuder- 
lying this method. In the remaining space at the writer's disposal some 
of these assumptions will be re-examined in the light of our new data. 

1. The AQ procedure involves the assumption that educational 
abilities are correlated to unity with mental age (in the sense of the 
Binet Scale or some other measure of general intelligence), at least 
when pupils are pushed to the limits of improvement (Franzen: 
hoc, cit., pp. 30ff). This, of course, implies due allowances for unre¬ 
liability of measures either by corrections for attenuation or some other 
analogous procedure. After “pushing” the pupils throughout the year, 
Frazen obtained correlations ranging from 0.70 to 0.85 where the 
estimates that the reliability of his measures would permit correlation 
up to about 0.96, for such abilities as vocabulai*y, arithmetic, reading, and 
completion exercises (pp. 21-23). Without pushing the agreements 
arc much less perfect. That such correlations will rise to unity is, 
therefore, a hope rather than a demonstrated fact. In spite of this 
statement, there is undoubtedly a great deal of value in the use of the 
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AQ. This value will doubtless be found to vary greatly with the 
particular school subiectSj being greatest for reading an arithTnctie 
and becoming less and less for grammar, spelling, geography, manual 
training, domestic science, handwriting, etc., in some order as yet 
undetermined. Convincing experimentation must yet be done in 
this connection before we can dogmatize. In view of the imperfec¬ 
tions of om measures of intelligence and educational ability, the AQ 
is certain to be markedly fallible unless longtime testing is carried out. 

2. The AQ will have a larger probable error than either the numera¬ 
tor or the denominator in its formula since it, like all quotients, is 
influenced by the unreliability of both terms of its formula. 

3. The value of the AQ will be diminished by faulty grade location, 
Probably not far from two-thirds of all pupils are at any given time 
in the wrong grade in the sense thattheir educational abilities lie nearer 
the norm of some grade other than the one in which they are actually 
found. (The authors of the Stanford Achievement test found this to 
be the case in four California towns studies.) It is obvious that we 
cannot expect the same educational quotients from two pupils having 
mental ages of 12 years if one is placed in grade IV and the other in 
grade VII, a situation which is not at uncommon. If, on the con* 
tary, the AQ technique is used to help discover sucli errors of clasBi- 
flcation, it will have real value. 

4. The adoption of the AQ technique for purposes of motivation 
and measuretnent of motivation will necessitate the establishment of 
norms of maximal achievement, “pushed" norms in Franzen' termin¬ 
ology. Fianzen’s contention that the AQ does not rise significantly 
above 1.00 is baaed upon reference to such norms. His statement 
certainly does not hold for norms established on unselected age groups 
with ordinary “balanced” or normal emphasis on the various subjects 
of instruction. The AQ’s for the 64 subjects of this study rise above 
1.00 in about 46 per cent of the cases with the Stanford Achievement 
Test, the mean being very close to unity. The distribution of AQ’s is 
therefore practically normal in this situation. 

6. There are certain arguments to be advanced in favor of AQ’s 
based upon chronological rather than mental age in that this procedure 
is committed to no hypotheses about the correlations of mental traits. 
It is a procedure that is anidogous to the concept of the IQ. The 
most vtdid objection to this method is that there is little correlation 
between CA and EA in a given school grade, the relationship often 
being negative (Table II). 
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6. The various educational teat batteries and standard mental 
tests are so lacking in direct comparability that unless a consistent 
adoption of tests is made and followed, there will result endless con¬ 
fusion. This is well brought out in the data reported in this study 
even with the consistent use of Binet IQ’s as a base. 

Conclusions 

1. The comparisons of the three educational test batteries, the 
Illinois Examination, the Lippincott-Chapman Classroom Products 
Survey Tests, and the Stanford Achievement Test, show differences in 
the achievement ages yielded which are significantly large. 

2. Certain evidences were found that the Lippincott-Chapman 
Testa are scaled somewhat too low, resulting in lowered educational 
or achievement ages. 

3. The Illinois Examination appears to be considerably less reliable 
than the other two batteries, although the results are probably as 
good as can be expected in a minute test. The Lippincott- 
Chapman test appears to be more dependable, approaching in accuracy 
the newer and more extensive Stanford Achievement Test. 

4. Study of these three test batteries indicates strongly that 
achievement quotients, to be reliable, must be based on 30 or more 
minutes of testing in the case of most school subjects. The Illinois 
Examination is too brief in our opinion to give entirely satisfactory 
results. The Lippincott-Chapman Tests appear to furnish reliable 
measures for the two subjects of reading and arithmetic. The 
Stanford Achievement Test probably is about as brief a teat as is 
consistent with scientific accuracy. It has the further advantage of 
covering six separate fields of elementary school instruction, thus yield¬ 
ing six separate subject ages as well as a single composite achieve¬ 
ment age. This composite age, at least, has the required reliability 
for all practical purposes of measurement. (Its probable error equals 
2 months of EA.) 

6. Correlations between mental age and achievement age were 
found to be moderately high for reading and arithmetic although not 
approaching closely to unity in any case* Whether such correlations 
ever can be made unity has not been demonstrated although Franzen 
has shown that this is at least a possibility. 

6. The AQ and the IQ are negatively correlated. 

7. Achievement ages are less variable than Binet mental ages when 
due allowance is made for unreliability. 



FORMULAS FOE THE CORRELATION BETWEEN 

RATIOS 

KAEL J. HOLZINGBR 
'Uniycrsity of Chicago 


The increasing use of ratios in educational measurement makes 
suitable formulas for theii’ treatment very necessary. Quite fre¬ 
quently the correlation between two ratios is required. This may be 
obtained by dividing each individual numci’ator by its denominator 
and then correlating the N paii's of quotients thus found. An alter¬ 
native procedure is to obtain the required correlation between ratios 
in terms of the correlation between the respective numerators and 
denominators. This method eliminatos the 2N divisions necessary 
by the first scheme, and haa certain theoretical advantages as well, 
X Z 

Let y and ^ be the ratios correlated, the capital letters denoting 


original scores or measures (small letters will be used later to denote 
deviations from means). ITurfcher, let Mu, M* and be the 
means of the undivided scores, Mx and the means of the ratios 

V iO 


X ' ‘ Z\ ' c 

y 'and -^1 the correlation between X and Y and Vx - Vv = 



The arithmetical means and standard deviations of the ratio 


are given in Yule^ as follows: 


M 

a - + n) (1) 

s r“ 

2n„F.7„ (2) 


*Yule, Q. U.: “Introduobiott to the Theory of Statistics." Sixth edition, 
Charles GriflBn, London, 1922, pp. 216. 
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Since N <rx oy — S*vi the required correlation, , may be written 

t/ w 

^ (y ~ ~ ■^-5) 

y 10 V W V U 

= z(f f) - nm . m '1 

u w 




-NM^M^ * 

y \o 

If this last expression is expanded neglecting terms higher than the 
second it reduces upon substituting (1) and (2) to the desired form, 

" V(F“. - 2a,,v,v, + V\)(y\- 2n,„y,F„ + y*„) 

Six correlation and four means and standard deviations are required 
to work out the value of f2a,^from this expression so that it hardly 

yv , 

appears shorter than the method of dividing for the N pairs of indivi¬ 
dual ratios. Sometimes, however, the above correlations are already 
at hand for other purposes and in this case substitution in formula 
(3) Is a very simple matter. 

A simplifioation enters in when the denominators of the ratios are 
the same, e.g. IQ’s by two different tests having identical chronological 
age denominators. Pormula (3) then reduces to, 

^ _ ^xzVxVx-\-y^w — Qawygyu" ^tioVzyw 

~~ - 2 n„y.y„ + v\)(.v \- 2n,.7.7. + yy ^ ' 

for which only three correlations and thi-ee means and standard 
deviations are required. A still further simplification occurs when a 
ratio is correlated with an undivided score. Thus for O** we may 

XB 

substitute T = 1 in the right hand member of (3) obtaining, 

“ ■\/T"“.-2S2.„y.y„ + y„ 

‘ ‘‘This result may bo shown equivalent to n(® — y){.z — w) when the means are 
equal. It may be further noted that Professor Chapman's formula (X) appear¬ 
ing in the February number of this Journal may only be used when the variables 
are expressed in terms of standard deviations. My attention has also been 
called to the fact that both results are included in an early paper by Pearson.” 
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This last expression is very useful in determining the validity of a 
Z 

given ratio, ^ when X is the criterion. For example ii Z= mental 
agC; W *= ohi’onological age and X an intelligence criterion, flj., will 

u 

be high when f)** is high, low, and high. A valid IQ should 
then show high correlation between mental and chronological age, 
high correlation between mental age and criterion, but low correlation 
between chronological age and criterion. These statements are all 
made with the understanding that the correlation between criterion 
and IQ is a measure of the validity of the latter. 

A further use of formula (5) is that of comparing the validity of a 
ratio with that of a simple score. If S2*a is higher than the ratio 

to 

^ would be preferable to the score Z on the basis of validity. 

In case a measure of the reliability of a ratio is desired formula (3) 
may be written in the form, 


V{v\- + 72 ,.)( 7 ^ 7 - 

where ^ and ^ are the ratios in successive trials with a test, and 

X 1 X a 

and 0„,v, the ordinary reliability coefficients of the undivided 
variables. By suitable choice of origins all of them may be made 
equal giving, 

~t" ~ flaiV, ^fiVi 

a, = ;r~> - —~ ~ - -- : (b) 

7,i 2V(1 - 

It ia again apparent that high correlation between numerator and 
denominator increases the reliability as well as the validity of a ratio 

■y’ Furthermore high reliability coefficients, ft®,*, and f2v,v, and low 

coefficients, and 12®,y, have the same effect. 

Finally if = S2®av, = Sl®iva = = ^xi,, equation (6)' re¬ 

duces to, 

_ ^ai*J “b HtflVs - 2 {2*y 

V\ vt 2(1 — 


(6)" 
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If expression of course becomes equal to 

unity also. In using formula (6)" S2,y may be taken as the average 
of ftnd or ^stvt s-iid. J2*,y, and a rough approximation to 
thus obtained. For very cai-eful work, however, it is best to 

I/I t/i 

use formulas (6) or (6)'. 



THE VALIDATION OF INTELLIGENCE TESTS 

A. M. JORDAN 
University of Arkansas 

The literature dealing with intelligence tests during the last few 
years has been unusually copious; in fact, each month it constitutes the 
major portion of three or four journals. This material for the most 
part has concerned itself with the invention and application of intel¬ 
ligence tests. That the results are valid has been taken for granted. 
Owing to (1) the prestige of the names of the makers which was usually 
sufficient to insure the careful statistical treatment of the data used in 
the construction of the tests and to (2) the pressing need for just such 
instruments, many accepted them uncritically. The cordial accep¬ 
tance by workers of these group tests of intelligence led to the construc¬ 
tion of several other tests of a similar natui’G until now there are as 
many as 16 on the market, all claiming to be measures of intelligence, 
There would be no objection to this condition provided that whenever 
a child, or a group of children, i.s measured on each test closely similar 
results obtained, Unfortunately this is not the case. Wide variations 
are found both in measurements of individuals and in the measurements 
of groups—much greater variation even than is necessary when gener¬ 
ous allowances have been made for that unknown factor, ‘‘variations 
in human nature.” 

Now as long os there was only one claimant to the papal throne of 
Saint Peter the people were not unwilling to believe that the Pope was 
the one divinely appointed, but when after the “Babylonian Captiv¬ 
ity” there were several claimants to this throne the power of the 
Pope-suffered because the people began to wonder which one, if any, 
was the real pope. So here, when instruments varying in their results 
claim to measure intelligence it becomes necessary to make a careful 
study of each group test and of each sub-test in order to discover ii 
possible which ig the beat measure of intelligence. A small beginning 
of this work is attempted in this paper. 

The writer is not unmindful of the good work that has been done 
in both the criticism and the validation of intelligence tests. In the 
former case both expert and lay criticism have appeared. Such 
experts as Bridges^® and Stenquist^’^*^* have laid their hands to 
this task. Bridges finding such small correlations with school marks 
in universities and realizing the small chance of a correct prophecy of 
class standing from such a correlation concludes “That their general 

34a 
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use in universities with the object of helping the above mentioned 
administrative and educational problems is absolutely contL’CHr 
indicated/’ and again, “The results arc not only disappointing as to 
their usefulness but may be actually misleading.” Stenquist also 
finds much to criticise in intelligence tests as now constituted. He 
points to the smallness of the correlation with mechanical ability and 
to the discrepant results obtained by individual pupils in several tests. 
Even when these two rather extreme accounts are tempered with the 
more hopeful findings of Colvin and MaePhaiU’ in their emphasis 
on the usefulness of intelligence tests for foretelling extreme grades in 
college when the correlation is low, still it seems evident that workers 
in mental tests are becoming more and more critical of these instru¬ 
ments of moasurement. Nor is this criticism confined to the pro¬ 
fessional students of the subject, for Walter Lippman^* after reading 
a few hooka on the subject criticised intelligence tests in various ways 
pointing out among other things both the lack of agreement in the 
definition of “intelligence*' and the fact that intelligence testers hid 
behind a mathematical and technical smoke screen and concluded that 
intelligence testing is one of the Babu sciences- 

This dissatisfaction with results has led to the studies of tho 
validation of intelligence tests. Among these are the studies of 
Breed and BresUoh,® Holley,®® Franzen,®^ Gates,®® Root,^^ and of the 
writer.®^ Breed and Breslich made a very thorough study of the 
use of intelligence tests in the classification of 'pupils. Holley's 
study is concerned with mental tests for school use. He uses six group 
tests with large numbers of children and computes correlations for 
each of tlio grades from III to XII. This writer, however, makes 
no attempt to coiTclatc the sub-tests with his criterion of marks nor 
to evaluate them with any other criterion. His was tho practical 
problem of determining the most helpful test for school use. He does 
correlate with each other the various sub-tests called by the same name. 
Franzen had more specifically in mind tho general evaluation of the 
■ various teats. He computed the intercorrolations for several tests 
and obtained the correlations again with the factor of reading constant 
by the eminently desirable device of partial correlation. Gates uses 
a composite of educational tests as a criterion with which to correlate 
several group tests of intelligence and the Stanford-Binet individual 
tests. Tins is one of the most elaborate studies that we have, using as 
it does both partial and multiple correlations and extending from 
grades I to VIII. The results obtained, however, were invalidated for 
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tile single grades since ‘‘There were about 20 pupils to the grade,” 
which number is too small for reliable correlations. His most sigoifi- 
cant finding is that ‘‘ Other things being equal the more verbal the 
material the higher the correlation with school attainment.” Root 
gave various group tests and the Stenford-Binet to 416 pupils in grades 
I to XII and then correlated each of the group tests with the Stanford- 
Binet. He concludes that “The varied character of the Otis Testa 
makes it more valuable in analysis than either the Dearborn or 
Mentimeter alone,” (This means that Otis is best). Finally the 
writer made correlations between four teste of intelligence and of thair 
sub-tests (31 in all) with grades. Thus sometimes with school marks, 
sometimes with other tests correlations have been made but few, if 
any, investigations have used several criteria, or included the sub-testa 
in their computations. 

The present study differs from those preceding in several respects 
investigating as it does the following problems: 

I. 1. The correlation of the four group tests (Army Alpha, Terman, 
Miller, and Otis) and the sub-tests of each with the Stanford-Binet 
Tests. 

2. The correlation of the four group tests and the sub-tests with 
the factor of age. 

3. The correlation of the four group tests and the sub-tests 
with school marks. (This last work was reported in a previous 
article but is summarized and discussed here as a part of the total 
study.) 

4. The correlation of the four group tests and the sub-tests with 
adearning test devised by the writer. 

6. The correlations of four group tests and the sub-tests with a 
composite of four tests. 

II. All these correlations except two were computed again with the 
factor of age partialed out. 

III. Rougher estimates were made of the value of each test by 
comparing it with each of the others to see how many pupils were 
maintained in corresponding thirds and to discover by means of 
assuming r to be perfect and then calculating the differences between 
actual scores and transmuted scores which test was most consistent 
in its results. 

IV. A collection was made of most of the published coefficients 
of correlation computed with these tests and a classification of 
them formed. 
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V. A bibliography of studies, in which the correlation ooeffioienta 
are computed between the tests as a whole or the sub-tests and some 
criteria, was compiled. 

VI. The discovery of the best sub-test or sub-tests of the 31 
was made. 

VII. The discovery of the best group test of the four was made. 

VIII. Discussion and conclusion. 

Description op Method 

Sixty-four pupils of high school age took each of the four group 
tests of intelligence, the Stanford-Binet, the Learning Test, and had 
their intelligence rated by their teachers. These 64 are included in 
each and every correlation computed. Concerning the first criterion 
it is sufficient to say that the tests were given by individuals trained by 
me during a period of three monthsi and by myself. The responses 
in all cases were taken down verbatim and all were scored by me; In 
obtaining teachers’ estimates of intelligence unusual precautioDS were 
taken to obtain accurate ratings. In the first place the four teachers 
whose estimates were used were men and women of maturity, critic 
teachers in our training high school, all of them having taken courses in 
psychology. A list of the names of all pupils was issued to them with 
following instructions: 

1. Rate as many pupils as you know mil Zero is lowest; ten is 
highest; five is medium. 

2. You may give fractional scores if you desire. 

3. Rate for general intelligence, by which is meant 

(а) Tendency to take and maintain a definite direction in 
thinking. 

(б) The capacity for making adaptations for the purpose of 

obtaining the desired end. 

(c) The power of self-criticism. 

The average of these four ratings was i^ed as a second criterion of 
intelligence. 

The third criterion is an ideational learning test which may be 
described as follows: Write the letter occurring midway between the 
following letters assuming that each letter has a number according to 
its position in the alphabet. For example, a is 1, 5 is 2, c is 3, etc. 
The letter midway between 1 and Swill be 5. Then followed 35 pairs of 
numbers such as 4-8, 6-10, 10-14, 13-17, etc. between which the 
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correct letter was to be located. The pupils had explained to them 
fully what tlie procedure was and all but two or three comprehended 
the instructions. All understood perfectly at the beginning of the 
second practice period. During one hour, 12 practice periods of three 
minutes each were obtained. The difference between the average of 
the first three and that of the last three was used as the third criterion 
of intelligence, the thought being that if intolligence tests really 
measure capacity to leai*n, then the amount learned between certain 
times when all were trying should correlate to some extent with the 
tests. 

In addition, correlations were made with age and this factor made 
constant in all correlations by the device of partial correlations, 

I am indebted to Professor Thorndike for this suggestion. Since, 
however, practically all the testa correlated negatively with age the 
effect of this procedure was to reduce slightly (.01 to .00) the correla¬ 
tions obtained. 

The second part of the program for evaluating the uses of these 
four tests is, concerned in testing the internal consistency of the tests. 
That is, if the tests measure approximately the same mental processes 
then there should be an approximate consistency of ranking among the 
four teats so that if the pupils were divided into thii'ds by means of 
the scores on one test they should fall approximately into correspond¬ 
ing thirds when another test was used. Secondly, if wc take the 

regi'cssion equation xl = 6 2 whore h = and assume that the 

correlation is perfect, then it is possible to obtain the score in xl Mdiicli 
could be expected fromx2. (These suggestions came to me largely from 
Breed and Breslich* in ttieir article in School Review,) The trans- 
mutations have been effected in all the tests and these transmuted 
numbers subtracted from the real scores and the averages of these 
differences cemputed. This gives a measure of likeness or of unlike¬ 
ness between the scores of the various tests. Finally a composite wag 
made 'by transmuting by the just mentioned method the scores of 
three tests into one test and taking an average of these transmuted 
Boorea and the actual score. This average was used as a criterion 
against which to measure each of the group tests. 

BESULTS] 

The following tables set forth the results of the correlations of the 
group tests and the sub-tests with the various criteria. 
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Table I.—Cobrblationb op the Four Tests and Sub-tests with the Stanpoud- 
Binet Mbntad Age, AT = 04 



MA 

PE 


MA 

PB 


MA 

PE 

■■ 

MA 

PE 

Otis 1 

.626 


Alpha 1 

.243 

.070 

Miller 1 

.360 

,073 

Tormnn 1 

.402 

.084 

2 

.601 

.066 

2 

.640 

.060 

2 

.407 

.003 

2 

.384 

.071 

3 

,275 

.077 

3 

.420 

.000 

3 

,4S2 

.005 

a 

,672 

.057 

4 

.508 

.067 

4 

.Qia 

.062 




4 

.460 

.007 

5 

.661 

.068 

6 

.602 

.050 




6 

,471 

.006 

0 

.482 

.005 

e 

.601 

.050 




0 

,441 

,007 

7 

.430 

.007 

7 

.470 

.000 




7 

.426 

.009 

8 

.384 

mrm 

8 

.607 

.002 

. 



8 

,437 

.008 

0 

,307 


. 






0 

.340 

.074 

10 

.414 

IB 







10 

.637 

.000 

Group lest... 

.000 

.047 


.087 

.044 


.630 

.000 


.080 

.046 


Table II,—Correlations op Eoub Group Tests and Sub-tests with Stanford- 
Binet Mental Age, !PaB Factor op Aob Held Constant 



MA 

■ 

AlA 1 

■ 



MA 

Otis 1 

.500 

Alpha 1 

.214 

Miller 1 

.310 

Torman, 1 

,430 

2 

.677 

2 

.640 

2 

,435 

2 

.383 

3 


3 


3 


3 

.565 

4 

.662 

4 




4 

.432 

6 

.641 

6 




S 

.400 

0 

.400 

0 

.486 



0 

.420 

7 

.420 

7 

.462 



7 

.408 

8 

.306 

8 

.480 



8 

.428 

0 

.331 





0 

.382 

10 

.300 





10 

.820 




.076 


.611 


.060 










Alpha-4 (Opposites) stands above all sub-iesia in correlation with 
mental age (61), It is higher even than all the sub-tests of the Miller 
Test combined and only slightly behind the other group tests. Rank¬ 
ing close to Alpha-4 are Otis-2 (.59), Terman-3 (.67), and Otis-4 
(.668). The two highest Alpha-4 and Otis-2 are both opposites as is 
also Terman-3, while Otis-4 is made up of proverbs. When the group 
tests as a whole are considered Alpha, Teiman, and Otis are practically 
the same while Miller shows a distinctly lower correlation. 

Table 11 was deduced from Table I by considering also Table III. 
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TaBIiS III.—CobuBLATIONS 0» FodB GbOOT TsfipTa AHD O? the SUB-TraSTB WITH 

Aqb. at e= 64 



The formula used was rl2.3 


_rl 2~rl3T23 

Vl-ria^ Vl-r23^ 


by which means 


the correlation existing between say Otis Group Test and mental age 
may be obtained irrespective of the factor of the relation of each to the 
third factor, age; or, to say it another way, with the factor of age 
constant. It would perhaps be mote precise to have all the pupils 
of the same age measured. If the latter were impossible the partial 
correlation device is next best. However, this procedure must be 
used with some caution Yule says: “Hence rl2.3 should be regarded in 
general as of the nature of an average correlation; the cases in which it 
measures the correlation between a;1.3 and a;2.3 for eve't'y value of a;3— 
are probably exceptional.”* The effect of this procedure is slight 
showing that in these 64 cases the age factor does not affect the correla* 
tions to any large degree. 

Again referring to Table III it is seen that almost all intelligence 
tests correlate negatively with age during the high school years. 
Younger children not only have higher IQ’s but have attained higher 
mental levels than the older pupils. The correlations with age were: 


Otis.... 
Miller.. 
Alpha.. 
Terman 


-.41 
-.32 
-.28 
- .24 


^ Yule, G. U.: “An Introduction to the Theory of Statistics,” pp. 261. 
^ Op. cii,, p. 252. 
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.047 

Estimates of 
intelligence 

3eocor-€jQeooeo4Nto 

.663 
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PE 
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oooooooo 

.052 

Estimates of 
intelligence 
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C4eO'^toeocO'<Hto • • 
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All the group tests and all the sub-tests save one, Terman-9, correlated 
negatively with age. The five that coiTolated lowest aret 


Otis-O. — .45 

Termaii-3. —.38 

Otia-4. — .35 

Otis-6..... -.33 

Mi]lei--3.. -.33 

Miller-3. -.33 


If we believe with Chapman and Dale that “A test element in which 
the performance of tho Young Bright exceeds that of the Dull Old is, 
except in unusual circumstances, ipso facto, a superior test of intolli- 
gence^” then the negative correlation with age would be quite signifi¬ 
cant and it could then be inferred that the higher the negative 
correlation with ago the better the test is as a measure of intelligenco. 

In Table IV are included the correlations'with teachers’ estimates 
of intelligence. It is well to remember tliat the estimates were made 
by four mature teachers and that each pupil’s position was determined 
Ijy the average rating given him by all four teachers. 


Table V.— ConnBLATioNS op Onow Test and Sud-tbsts with TBACnEns’ 
Estimates, thb Factor of Aqb Constant. iV = 04 



Eatlmatoi 
of IntolU- 
Bonco 

■ 

Betimates 
of iHtolll* 
gon«o 

■ 

EBtimatca 
of intclli- 
gonoo 


Eslimnlcs 
of Intdll* 
Bcnco 

Otia 1 

m 

Alpha 1 

.188 

Miller 1 


TormAn 1 

.530 

2 

mBM 

2 

.022 

2 


2 

.448 

a 

.232 1 

a 

.421 

3 

.538 

3 

.68-1 

4 


4 

.663 



4 

.442 

6 

.333 

6 

.354 



6 

.189 

a 

.694 

6 

.306 



0 

.390 

7 

'mSM 

7 

.388 



7 

.183 

s 


8 

.521 



8 

.364 

9 








10 

,366 





10 

.618 

Qroug test.,,,, 

.090 


.576 

im 

.044 


.'03Q 


* Chapman, J. C, and Dale, A. B.: A Further Criterion for the Selection of 
Mental Tost Elements, Jt. Ed. Psy., Vo. XIII, pp. 273-274. 
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In considering correlations with teacher’s’ estimates of intelligence 


Otis.73 

Alpha... . .61 

Terman.66 

Miller.68 


which become slightly lower when eoi’r’eeted for age. The five 
highest sub-teats are 


MiUer2.63 

Army 2. 63 

Otis 6. 60 

Alpha 4—..59 

Miller 3.68 


The average correlation for tho four group tests with mental age is 
.64; with teachers’ estimates of intelligence .67. The effect of making 
age constant is to decrcaao the coefficients from .01 to .06. 

In considering the correlations with the learning test {Table VI) 
one must remember that this test is comparatively simple and that the 
amount of improvement made was the criterion used. 


TaBLB VI. —CORIIBLATIONB OF GrOUP TbSTS ANO Svr-TBBTS WITH A LEARNING 

Tbst. JV = 04 



Coam¬ 

ing 

tOBt 

PE 



PE 

■ 


PE 

n 

nni 

pE 

Oils 1 

.243 

.079 

Alpha 1 

.120 

.083 

Millor 1 

.234 

.070 

Tarmon 1 

.180 

.080 

2 

.116 

.083 

2 

.307 

.076 

2 

.103 

.0B3 

2 

.231 

,070 

3 

.183 

.081 

a 

.102 

.081 

3 

.160 

.082 

a 


.070 

4 

.170 

.082 

4 

.206 

.078 




4 

.210 

,080 

5 

.200 

Bi 

B 

.lai 

.081 




6 

.313 

.076 

e 

.274 

.078 

0 





.... 

0 

■ran 

.083 

7 

.293 

.077 

7 

.141 

.083 




7 

.136 

.083 

8 

.170 

.081 

8 

.162 

.083 




8 

- .125 

.083 

0 

.172 

.082 

0 

.... 


1 



0 


.064 

10 

.003 

.084 

. 

.... 



• i • r 


10 


.084 

Group teat... 

.228 

.080 


.200 

.080 

Bl 

.174 

,0S1 


.207 

.080 
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Table VII.—Correlations op Gbotip Tests and Sub-tests -with a Learning 
Test tub Factor op Aob Constant. iV = 04 



Learning 

test 

■ 

Learning 

teat 

■ 

LearnlDg 

tost 


Learning 

teal 

Otle 1 

.200 

Alpha 1 

.004 

Miller 1 

.208 

Tormati 1 

■ 166 

2 

.042 

3 

.270 

2 

,001 

2 

.230 

a 

.003 

3 

.114 

3 

.120 

3 

.140 

4 

,078 

4 

.203 



4 

,172 

S 

.102 

5 

.148 



6 

.313 

0 

.101 

0 

.167 



0 

.064 

7 

.270 

7 

.081 



7 


8 

.120 

8 

.103 



8 

.102 







0 


10 

-.088 





10 

• 020 


.118 


.130 


.144 


■ 143 




umii 


taMmi 


The correlations with this factor are low: 


Otis. 23 

Army.21 

Terman.21 

Miller.17 

The five highest of the sub-tests are: 

Terman-5..31 

Army-2.31 

Otis-7...29 

Otis-G.27 

Army-4.26 


If intelligence is defined as the capacity to learn then the type of 
material to he learned will have to be defined, for in the type of learning 
in which an individual imaginally places a letter between two other 
letters designated by numbers the correlations are too low to be very 
significant. 

And finally the four group tests wei-e combined into one composite 
and this was used as a criterion. This criterion is fairly important 
because with high school pupils there is accumulative evidence that 
two or three group tests combined give unusually good measurements 
of intelligence. 

The composite was made by computing the regression equations 
for each pair of group tests, then by assuming that the correlation 
occurring in the equation was 1.00; secondly, by means of the equations 
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thus found substituting each pupil’s score in one part of the equation 
jnd thus obtaining the most probable score in the other. All scores 
jvere converted into Otis scores and the three converted scores together 
57 ith the Otis score were averaged, this being used as a composite. For 
further description see the article by Breed and Breslich referred to in 
the bibliography and also the later pages of this paper. 

Table VUI.—Correlations op Group Tests anr Sub-tests with a Composite 
Made dp op Alpha, Milmr, Otib and Terman. N = Qi 



Tadlb IX.—Correlations op Group Tests and Sdb-tbsts wits a Composite 
Made Up op Alpha, Miller, Otis and Tbrman, the Factor op Age Bsino 
Kept Constant. = 64 



Considering the tests as a whole there seems to be no real difference 
among the four tests in their correlation with this composite. We 
expect a somewhat high correlation with a composite since each group 
test itself is a part of the composite but no such similarity of results 
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could appear unless there were a marked similarity among the mental 
functions tested. There are considerable variations among the sub¬ 
tests. The five highest here aie; 


Terinan-3 (opposites).83 

Miner-2 (cause and effect).79 

Otis-4 (proverbs).78 

Miller-3 (analogies).78 

Alplia-4 (opposites). 78 


As a whole the correlations of the sub-tests are higher than they arc 
with any otlier criterion. The two lowest of these correlations are 

Alpha-1 (oral directions).39 

Alpha-0 (number completion).41 

The general effect of making age constant is to make the con’clations 
somewhat smaller (rarely over .05) in most cases but in some few caeea 
to increase very slightly the size of the coefficient. 

Lot us now consider the various types of material which makeupthe 
sub-teats. Seventeen different varieties of material compose the 31 
sub-testa. Analogies and mixed sentences occur in all four tests, 
arithmetic and opposites in tiiree; information, number series, best 
answers, and hard directions in two; and sentence meaning, classifica¬ 
tion, cause and effect, geometric figures, proverbs, narrative completion, 
similai'Ities, logical seleotion, and memory occur in one each. Table X 
shows the five tests correlating highest with various criteria. 

Tablb Xi—Tub Hiohe0t Cobrblationb AiaoNo tub Sub-tbbtb with Sk Gih- 
niHU. (Figouhs to tab Left Indicate the Nomdeh of Sot-tbhtb that 


Went to Make Up the Scoub Indioatbd) 
Learning teet Grades 


3 Arithmetic rcoaoning........ 

... .28 

3 Arithmetic ronsoning. 

.40 

1 Qeometi'io figures.. 

... .27 

1 Sentence meaning. 

., ,46 

1 Logioai solootlon. 

... .22 

4 Analogies. 

,. .44 

2 Best answers. 

... .21 

4 Mixed sentences. 

,43 

3 Opposites.. 

... .21 

1 Olassifloation. 

.40 

Average. 

... .24 

Average. 

... .44 

Menial Age 


Teachers esUmates 


3 Opposites.. 

... .59 

1 Cause and olfect. 

,63 

1 Proverbs.. 

... .67 

1 Geonmtrie figures. 

.. ,60 

3 Arithmetio reasoning....... 

... .62 

1 Similarities.. 

.. .68 

2 Number series..... 

... .62 

3 Opposites.. 

.68 

2 Information. 

... .60 

3 Arithmetic reasoning. 

., .66 

Average. 

... .64 

Average. 

.. ,69 
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Table X. — (Continued) 

Lowest vnlh age CamposUe of Alpha, Oiis, Miller, Teman 

1 Narrative completion......... — .46 3 Oppositea .79 

I Geometrio figurea. — .37 1 Cause and effect. 79 

1 Proverbs. —.35 1 Proverbs .,78 

3 Opposites. —.29 2 Information .73 

1 Memory. —.26 1 Following directions (written). .. ,72 

Average.. — -34 Average.76 

The opposite test stands out clearly ahead of the rest since it leads 
in its correlation with mental age and composite, comes fourth witli 
teachers estimates and with age, and appears fifth in the learning test. 
It seems cleai’ therefore that it would be unwise to omit it in any group 
of tests which wa-e intended as a measure of intelligence. Arithmetic 
reasoning comes next followed by geometric figures and proverbs. 
It is rather interesting to know that every one of the sub-tests is 
represented in these thirty correlations. 

It also seems worth while to tabulate the four group tests together 
somewhat in summary of this section of the papei’. The correlations 
with grade were taken from an wticle by the writer. 


Taulb XL —Summary op Cobublations op Group Tbbts with Various 
C niTBiUA. = 04 



Loam- ^ 
ing tost 

Mnrkv 

Monta) 

ago 

'Toaoliors 

estimates 

Lowest 
with ago 


Average 

liank 

Olb. 

.228(1) 

.480(4) 

. 000(d) 

.730(1) 

-.408(1) 


.607 

1 

Alpha. 

.200(3) 

.470(5) 

.087(1) 

.013(4) 

-.283(3) 


.620 

3 

Miller. 

.174(4) 

.470(6) 

. 630(4) 


-.320(2) 

MiiiHAI 

.613 

4 

Terman. 

.207(2) 

.402(1) 

.080(2) 


-.244(4) 

.006(3) 

.532 

2 


If we rank each of these tests within each of the criteria, then average 
the ranks, some indication may be had as to which is the best instru¬ 
ment for all round purposes. As far as our data go for testing intelli¬ 
gence in high schools, Otis is the best test of the four since it scored four 
first places and had an average of .567. Terman group test ranks next 
although there is little to choose between it and Army Alpha while 
Miller is only a little behind Alpha. The Miller Group Test receives 
no first place in any of the ca^iteiia chosen. This much for the data 
considered in mass; analytically they tell a somewhat different story. 
For example, the differences between the correlations of the four tests 
with the composite are negligible as ai‘8 also those with the learning 
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test and witli the grades. On the contraiy with mental age, teachers 
estimates, and chronological age the differences are quite significant. 
It ia noteworthy, that the positive correlations with the learning test 
are gi’ouped around two-tenths, with marks around four and one-half; 
with mental age ai'ound five and six; with teachers’ estimates around 
six and seven; and with the composite around nine. 

Thus far nothing has been said about the value of the criteria 
chosen. Nobody knows exactly which should have the most weight 
although, personally, I should put first in this case, teachers’ estimates; 
second, Stanford-Binet mental age; and thii’d, grades. 

The second division of the study concerns the consistency of the 
data. Tests that measure approximately the same mental processes 
should, it seems, within limits, place individuals in corresponding ranks 
in the four tests. Consequently if the scores in the tests were ranked 
from lowest to highest, the lowest third in one test would correspond 
largely with the lowest third in another. The following table throws 
light on this question. 

TaBW XII.— DisrLACBMBNTS FROM COBBBSPONBINQ ThIRDB IN RELATED TbBTS. 


W » 64 



Otis 

Alpha 

: Otie 
Termon 

Tenuen 

Alpha 

Otis 

Miller 

Alpha 

Miller 

Termon 

Miller 

r 

.84 

.78 

.71 

.81 

.77 

.79 

I 

7 

9 

12 

6 

8 

8 

II 

10 

13 

10 

8 

10 

8 

III 

6 

6 

8 

6 

6 

4 

Per cent. 

34 

42 

47 

28 ! 

37 1 

31‘ 

Double displacement... 

1 

2 

5 

2 

2 

1 


‘ In a recent ortiolo (31) the writer published computations from a table of 
Thorndike’s which indicated that with a correlation of .70 the percentage of each 
third in one test falling in the corresponding third of the second would be theo¬ 
retically fifty-five out of a 100, and for .90, seventy-one out of a 100. Note that 
with a oorrelation of .71 we have fifty-three out of a 100 in their corresponding 
thirds; with a correlation of .84, sixty-four out of 100; with a correlation of ,79, 
sixty-nine out of 100. Let us compare with these data some with Dreed and 
Brealioh (8). These authors find that with an r of .69 there is a correct placing of 
68.6 per cent in corresponding thirds, with a correlation of .77, 70 were placed in 
corresponding thirds. Thus we have some empirical evidence to compare with 
the theoretical expectancy. 
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From this table, Otis and Miller seem most alike since only 18 
persons out of 64, or 28 per cent, were placed incorrectly. But even at 
the best the condition seems bad enough. Here are pairs of tests with 
a correlation above .70, called “high” by Rugg and others, displacing 
almost half of the pupils not from corresponding tenths but from 
corresponding thirds. True it is that some of the displacements were 
near division points but not all of them were and in some cases the 
displacements were from the lowest to the highest third or vice versa. 
And yet many a prognosis has been made on a correlation of .60 or 
even less. 

In order to coiiy out even further and more precisely the investiga¬ 
tion of this question of differences existing between tests, a procedure 
using the regression equation and pi’ophesying from one test what the 
score would be in the other waa undertaken. Then by subtracting 
the transmuted score from the real score there was obtained a measure 
of likeness or difference between the two tests considered. In this 
case the assumption is made that the correlation is perfect so that we 
may get the score from one test which should exactly correspond with 
that of the other. “Perfect measurement of a constant quality or 
trait would show no difference between two scores of this kind. Such 
a difference is a symptom of inaccuracy of measurement or of varia¬ 
bility of the thing measured, or of both. It represents for the teacher 
the amount by which he ma-y expect two of these tests to differ in 
the measurement of the same pupil” (8 p. 61). Instead of using the 

formula y = x used by these authora, the writer transmuted this 

formula by placing for y its equal Y — M„, and for .t its equal X — 

M^; thus changed, the formula becomes Y —M„ = ~ 

by means of which each pupil’s score may be transmuted from one 
teat to another. The formulas for the transmutations were derived 
from the following data: 



Average 

SD 

Hepresentation 

Otia. ., ,. 

160.0 

6.68 


Alpha. 

119.0 

4.08 

<r> 

Millnr 

70.2 

3.60 

a-i 

Terman. 

m.6 

6.94 

■ <f3 
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and are: 

For Otis and Alpha XI = IMX 2 -f 14.34 ( 1 ) 

For Alpha and Otis X2 = . 88 X 1 —13 (2) 

Otis and Terman Xi = . 96 X 3 + 21,6 ( 3 ) 

Terman and Otia X 2 = 1.04Xi — 27.5 (4) 

Army and Terman Xj = .83Xa + 12.35 ( 5 ) 

Terman and Army X 3 = I.I 9 X 2 — 13.1 (6) 

Otia and MUler Xt = 1 . 62 X 4 H- 37,3 (7) 

Miller and Otis X 4 = . 6 X 1 — 19.8 ( 8 ) 

Army and Miller Xa = 1.42X4 + 19-3 (9) 

Miller and Army X 4 = .7Xa — 13.1 (10) 

Terman and Miller Xj = 1.7Xi -\~ 9.2 (11) 

Miller and Terman X 4 = . 59 X 3 — 5.6 (12) 


With these equations before us' it is a simple matter to transmute 
scores. For example, the score of our first pupil in the Otis Group 
Test is 203 and his score in the Army Alpha is 172. Suppose we 
wanted to know what score would be expected on Alpha if Otis and 
Alpha measured the same mental processes and were perfect measuring 
instruments. By substituting 203 for Xi in equation (2) we get Xs = 
.88 (203) — 13 which solved becomes 165.64 or 166 which is six points 
less than the actual score 172. Again, if we had the Army Alpha 
score and wish to know what Otis would be we use equation (1) which 
becomes Xi = 1,14(172) 4- 14.34. Solving we obtain Xi = 210 or 
seven more than the actual score. Differences, then, were computed 
for each individual for all the tests taken two together. These 
differences were averaged and the average used as an indication of 
tho discrepancy obtained when a pupil was measured on two tests. 

The large differences between points scored on tests called by the 
same name, i.e., “intelligence tests” ore even more clearly shown in 
this table than in the previous discussion of thirds. In individual 
oases the discrepancy between scores is enormous. If we leave out the 
, 71 . points which undoubtedly was due to some failure by the individual 
to cooperate still 46 or 42 or 49 is a vei*y largo difference and on the 
Army Alpha would correspond to almost threo years of mental growth. 

;: If we attempt to compare the tests by using the average of the 
differences of the score in the transmutations of three tests into tlie 

- t The writer believes that those equations arc of general validity and may be 
iiBod to transmute scores from on© test to another with the Braalle.?t possible chance 
of error. 
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Tabi^i XIII-—Avbraqb Dippbbbnobb dbtwbbn Actual Sconas and Scoreb 
Thanswuted According to PnocBounB Dbsoridbd in Text’ 



Grootosl 

dtflor- 

eiieo 

Siiinllost 

dtUcc- 

onoo 

Avor- 

QgO 

Aver¬ 

age 

three 







Avorage of three 






standnrd dovla- 






tion Otis 

Army Trananiuted and Otis Actual. 

88 

1 

1 .Oi 

12.8 



TcruiAa Transmuted and Otis Actual. 

46 

.1 

15.0 

' 13.0 

2,45 

Millar Transmuted nnd Otig Aotual. 

30 

.68 

18.6 








Averaso of three 






standard clevt- 






ntlon Tcrman 

Miller TraDamuted and Tornian Aotunl. 

711 

.2 

IS 



Amy Trsnemuted aod Torman Actual...... 

40 

.2 

17.2 

16.4 

2.60 

Oils Transmuted and Tormnn Actual. 

41 

.2 

16.2 








Average e( throa 




1 


standard devi- 




1 


Dtion Army ' 

MlUor Transmuted aitd Army Aotual. 

46 

.1 ' 

13.4 ' 



Termen Tionsmuted nnd Array Aolual. 

41 

.2 ' 

U.8 

18.1 

2.33 

OUa Transmuted nnd Army Actual.... 

81 

,3 

11 








Avorage of tbroo 






Blandord dovl- 






atloD MlUor, 

Otis Transmutod nnd Miller Aotual. 

23 

.0 

8.3 



Tcrman Transmuted nnd Miller Aotual. 

42 

.12 

7,8 

8.6 

2,46 

Army Transmuted and Miller Aotual. 

91 

.2 

1 9.7 



^ XJndoybtedly an exceptional oaae. 


one test we find almost immediately a difficulty. A casual glance at 
the four averages, however, reveals the fact that those tests which 
have the smallest range have also the smallest average difference. For 
example. Miller has a possible range of 0 to 120, while Alpha has a 
possible range of 0 to 212, and Otis from 0 to 230. A change of one 
point in Miller is almost equivalent to two in Otis. It seems reasg^/}; 
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ablft therefore to divide the average difference by the standard 
deviation of the actual test into which the three other tests were 
transmuted. This method places Otis first, Miller second, Terman 
third, and Army Alpha fourth. This may be interpreted, then, that 
Otis shows a less variation from the expected score than any other test 
and in this respect is superior. 

To summarize: Large differences are found between intelligence 
tests in placing pupils even in so gross divisions as corresponding 
thirds, there being from 28 to 47 per cent displaced according to the 
closeness of the relationship. Transmutation of scores by assuming 
perfect correlation in the regression equation finds also large differences 
between scores assumedly the same, the range being from 7.8 points 
to 17.2 on the average. The differences in individual scores may run 
as high as 45 or 50. The inference is, therefore, that even when we 
make, due allowance for the variability in human nature there is still 
a large residuum due to the (1) imperfections in the tests, or (2) to 
development of the tests along somewhat different lines caused by the 
lack of agreement in the definition of intelligence, or (3) to some other 
undiscovered cause or causes. 

(To he continued in October) 



an experimental study op the relative 
difpiculty op trde-palse, multiple- 
choice, AND INCOMPLETE-SENTENCE 
types op EXAMINATION QUESTIONS 

H H REMMEKS, L. E. MARSOHAT, ADELAIDE BROWN and ISABELIjA 

CHAPMAN^ 

Colorado College, Colorado Springs, Colorado 

In a recently published aj-ticle,® the present writer urged the desir¬ 
ability of norms and standitrds of 'achievement in the mastery of the 
tactual material of textbooks. It seems to him deshable to caiTy over 
to classroom instruction and examinations the more objective methods 
of the builders of mental and educational tests. McCall in his recent 
book* points out the importance of examinations in the economy of 
the work of the schools of this country. Knight^ and Barthelmess® 
have also published articles bearing on the problem here under con- 
Bideration. Much of the liberatui*e on psychological and educational 
research of recent years could be cited as applying more or less directly 
to our problem, but it is not the purpose of the present study to give a 
gummary of this literature. 

The writer, while discussing these types of questions in a class in 
educational psychology was met by the objection that they were of 
unequal and undetermined difficulty, and that they were undesirable 
in that they tended to suggest to the examinee the required answer. 
The latter objection made particularly concerning the multiple-choice 
and true-false types of questions, has been met in the references cited 
and will not be treated here. It was determined, however, to secure 
some objective evidence on the relative difficulty of such questions. 


‘ Responsibility for this study ia divided as follows: Marschat, Brown, and 
Chapman collected the data at the suggestion and under the supervision of Rem- 
mers. The statistical treatment of the data was done by Marschat and Remmers. 
For the interpretation and publication of the results the la tter is wholly responsible. 

•Remmers, Hermann H.: A Suggestion to Writers find Users of Text-books. 
School and Society, March 3,1923, pp. 243-4. 

• McCall, Wm. A.; "How to Meaauro in Education,” pp. 119. 

^ Knight, F. B. t Data on True-false Test as a Device for College Examination. 
Jour, of Educ. Physch., February, 1Q22, pp. 76-80. 

• Barthelmeas, H. M.: Reply to a Criticism of Tests Requiring Alternate 
R;e8ponso8. Journal oj Educational Researcli, November, 1922, pp. 367-69, 
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Subjects and Materials of the Experiment 

Fifty-six members of an elementary course in psychology were used 
as subjects. Each of these had been given a mental test (Otis Group 
Test Form A) and on the basis of the obtained scores, the students 
were divided into four comparable groups, 14 in each group. Among 
those competent to judge, the most commonly accepted definition of 
whatever it is that is measured by such tests of “general mental 
ability” is perhaps ahility to learn. It was therefore assumed for the 
purposes of this experiment, which tested the ability to learn and 
retain certain specific bits of factual material, that these groups were 
approximately equal. It may be argued that this was an unwarranted 
assumption. It will be more profitable to consider this objection after 
the data of the experiment have been presented. 

In the course in psychology the students are required to attend 
one laboratory period of two hours each week. The first part of this 
laboratory course is given over to a rather intensive study of the 
development of the central nervous system. From charts, diagi’ams, 
microscopic slides, and models of the brainand spinal cord the sfcudenta 
were required to make and properly label a specified number of draw¬ 
ings. They were also required to hand in with each drawing, five true- 
false statements, five multiple-choice statements, and five incomplete 
sentences on the material studied. All labels on the drawings were 
carefully checked, and no drawing was finally accepted and graded 
until all labels were entirely correct. In this way it was assured that all 
' students were exposed to all the material used in the experiment. It 
Was from the examination questions that the students themselves 
handed in, that the test questions used in the experiment were selected. 
Sixty questions each, of all three types calling for the same specific 
bit of information were oulled from the mass of material at hand. 
A sample of each type of question follows: 

True-false .—^The anterior third of the embryonic spinal cord forms 
the brain. 

Multiple-choice .—The anterior 
cord forms the brain. 

Incomplete-sentence. —^Tho anterior-of the embryonic spinal 

: cord forms the brain. 


third 

half 

two-thirds 


of the embryonic spinal 
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Method and Data 

The 60 questioDS of eacli of the three types were mimeographed, 
and about six weeks after the portion of the laboratory course dealing 
with the development of the nervous system had been completed, the 
four groups of students used in the experiment were examined without 
having been given previous warning. The questions were distributed 
as follows: 


Groups 1 and 2 Form B (multiple-choice) 
Group 3 Form A (true-false) 

Group 4 Form C (incomplete-sentence) 


The students, already familiar with the various types of questions 
were instructed as follows; 

"What we are about to cany out is not an examination, but an 
experiment. Your performance in this experiment will in, no 
way affect your grade in the course. We are merely attempting 
to discover the relative difficulty of the different types of questions 
that have been passed out. In order to have the experiment a 
success, however, it will be necessary for you to do your best ” 

A spirit of cordial cooperation was apparent among the partici¬ 
pants. Sufficient time was allowed for all to finish. Table I gives the 
mental rating (raw scores), the scores on the test questions, and the 
averages and variability in terms of SD for each gi’oup. 

How do we know whether the difference between, any two of the 
average scores is a significant one? And if significant, by how much? 
Specifically, is there any warrant for thinking that the difference 
between the average scores of groups 1 and 2 is or is not significant? 
By using McCall’s technique for calculating the "experimental coeffi¬ 
cient^’^ we find that there is not; the chances are approximately 1 to 1 
that this difference of 1.6 means nothing. The formula for this 
coefficient follows: 

Experimental coefficient = 2.87 X diff. 

Where a diff. = -y/v? + \ru^- 


^ Op. cii., pp. 404. 



370 The Journal of EducaUoml Psychology 


Table I.—Showing the Men^ai, RAViKa, Scobs on Test Questions, AvBRiaas 
AND Standabo Deviation bob Bach Group of Subjects 


Form A 

Form B 

Form B 

Form 0 

Group 3 

Group 1 

Group 2 

Group 4 

Studonb 

Mental 

rating 

4) 

U 

o 

r 

Student 

3 » 
S’-S 

8 i 

H * 

1 

1 

1“ 

1'^ 

o 

^ " 

Student 

1“ 

E 

H " 


204 

2& 

On 

207 

% 

|H1 

212 

41 

Wl 


21 

Go 

200 

19 

rm 

204 

31 

' Mi 

197 

gMl 

Ho 

200 

20 

Mu 

109 

& 

Bn 


as 

Ni 

197 

32 

Wo 

193 

28 

Ny 

198 


Da 

!9 

47 

No 


33 

Wn 

104 

26 

St 

■ 


Br 

£9 

33 

Pu 


■■Jf 1 


[EK 

0 

Mo 

9 


Wh 

181 

36 

Cl 

184 

27 


180 

20 

11 

9 


Fn 

181 

34 

Cr 

181 



180 

26 

Oa 

170 

23 

LI 

176 

88 

Ro 


28 

Sn 

170 

18 

Tq 

1S8 

17 

Co 

176 

35 

El 

172 

36 

Pa 

EEB 

27 

Mb 


14 

Mo 


31 

KO 


28 

Co 

109 

20 

Po 

103 

9 

Or 

168 

84 

Ho 


30 

Ro 

lOS 

10 

No 

lot 

13 

8W 

166 

22 

Re 


32 

Wr 

106 

20 

Ed 

168 

11 

Cl 

166 

26 

Ed 


30 

Dr 

167 

19 

Mo 

147 

13 

Cp 

m 

33 

Fn 

136 

26 


126 

9 

Average.,..... 

177.79 

P 


177.78 

33.67 


177.78 

32.07 


177.78 

20.3 






6.6 



4.7 



6,8 


91 

H 

■ 










Table II gives the data for all the differences. 


Table II,—Showing the Embrimbntal Cobppioibnt op the Dippbhbngbs 
. AMONG THE Pour Groups, and Tbbir Relative Signipioanob in Terms 
OP Approzdiate Chances 


Groups 

Average aoorea 

Exporimontal 

ooeffiolenfc 

Approilmnte 

chances 



.07 

1 to 1 


33.57 and 14.00 
33.57and 20.30 
32.07 and 14.00 
32,07and 20.30 
20,30 nnd 14.00 


76 to 1 



17 to 1 






16 to 1 


.26 

3tol 



From the above it is apparent that, on the assumed equality of the 
four groups the probabilities are very high that there is a significant 
difference between the difficulty of multiple-choice and true-false 
statements; that the probability that the incomplete sentence is more 
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difficult than that of the multiple-choice type of question is sub¬ 
stantial;^ and that the difference between incomplete sentences and 
true-false statements is not great enough to be regarded as significant. 

It might be argued that it follows from the assumption of approxi¬ 
mate equality of the four groups as determined by Otis scores, that 
there should be a high positive correlation between these mental ratings 
and the experimental scores. This was not found to be the case. 
Another criterion that was available was the semester grades of the 

8 S(? 

students. Using the formula B = 1 — correlation of each 

of these criteria (f.e., Otis scores and semester grades) with the experi¬ 
mental scores was found as listed in Table III. 


Table III.— Showing the CoRiainmoN between Otib Scores and Experi¬ 
mental Scores and between Semester Grades and Experimental Scoreb 



R between Otis scores and 
eiqperimental scores ' 

R between semester grades 
and experimental scores 

Group 1 . 1 

+.262 

+ .339 

Group 2. 

+ .047 

+ .370 

Group 3. 

+ .154 

+ .247 

Group 4 .' 

+ .324 

1 

-.123 


In view of the small number of cases in each group it was not con¬ 
sidered worth the statistical labor to use the more accurate pi’oduct 
moment method of cowelation. One such r was calculated between 
Otis scores and experimental scor^ (Gwup 3, Form A); r « .17fi±,20L 
The difference between R and r is too slight to warrant the work 
involved. Obviously the PE of each of these coefficients is so large as 
to make them highly unreliable. By using McCall’s Transmutation 

^ A theoretical couBideratioa eoncenuBsthe method of scoring should be taken 
up here. Just aa in the truo-IalBo statements half of the answers would be correct 
by pure chance, so in the multiple choice statemonts ona-tturd of the answera would 
be correctly answered if only ohaaee operated, since only three answers were pos¬ 
sible to each question. If this liad been taken into account in scoring the multiple 
choice statements, the average scor^ for groups 1 and 2 would have been 22.38 and 
21.38 respectively, and the difficulty of this type of statement would then be 
very nearly that of the incomplete sentence type. In the ordinary standardized 
mental and educational test this factor of chance insofar as it conoerns multiple 
choice statements is usually disregarded. 
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Table^ U may be transmuted to r, and PE calculated from the usual 

1 - r2 

formula: PEr = .6746 ’ 

Does it follow then that the assumption, that these four groups 
were equal is invalidated? Not necessarily. There is a reasonable 
supposition, at least, that our assuniption was substantially correct 
inasinuch aa groups 1 and 2, both, tested with. Porm B, weie proved to 
be practically equal so far as this, test is concerned. The two available 
criteria of equality (Otis scores and semester grades) have apparently 
about the same value. An extension of this experiment^ is requhed to 
demonstrate conclusively that the obtained differences represent valid 
differences. The data of this experiment do, however, furnish a strong 
presumption in the support of such a conclusion. 

> Op. cit,, pp. 393. 

* I shall be glad to receive oritioisms of tliie study with a view to extending the 
investigation here reported. 



A UNIFORM OBJECTIVE EXAMINATION ON 
INTELLIGENCE TESTING 

DENTON L. GEYER 
Chicago Normal Coilego 

A standardized examination in the field of intelligence testing would 
allow the instructor to compare his class with similar classes elsewhere; 
would enable him to experiment with different methods of teaching; 
and, as compared with the essay examination, would cover more 
ground, save time in mai’king papers, and furnish an objective basis 
for dealing with indolent or incompetent students. In brief, a stand- 
ordized examination would have the same advantages in college that 
it has elsewhere. But in colleges it would, of course, attempt no more 
than to cover certain agreed-upon minimum essentials, leaving each 
instructor free to supplement this section of the course in any way 
he chose. 

As an initial stop toward such an examination, a true-false test of 
90 Btntementa and a multiple-answer or recognition • test of 60 
statements, based on the Yearbook on Intelligence Testing issued 
by the National Society for the Study of Education, were mailed to 
summer school classes in July, 1922, for preliminary try out and stand¬ 
ardization. The norms thus secured are shown in Table I. 


Table I.—Sconss anu RsnoENTu,® Ranks for 640 Stddbntb in 19 Uniybr- 
BIWES, CoLlBQB.S, AND NORMAL SCHOOLS 


Percentile rank. 

10 

20 

30 

40 1 

60 

80 j 

70 

80 

00 

Score on multiple answer.. 

30 

33 

36 

38 , 

40 

42 

44 

47 

60 

Score on true-false. 

26 1 

34 



60 

64 1 

68 

62 

70 


Whether there is sufficient uniformity in the content of various 
courses on intelligence testing to make standardization feasible was 
investigated by computing the percentage of each class which missed 
each question. Assuming that answera can not be easily guessed, or 
supplied from general information, if the number missing a given 
question in various institutions is about the same, the distribution of 
emphasis among various topics is presumably similar. These results, 
presented for the multiple-answer test in Tabic II, show that when the 
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Table II. —Fbequbncy op Dippbrbnobs Among Collugsb in Pbrobntaqb o? 
Class Missing Each Question op the Mui/tiplb-answbii Type 
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17 
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percentage of a class missing each question in each of four representa¬ 
tive institutions is subtracted from the percentage missing the cor¬ 
responding question in another of these institutions, the remainders 
are very small. About one-half of the remainders are less than 
10, about three-fourths are less than 20, and about nine-tenths are 
less than 30. The agreement for the true-false test is slightly closer. 
Only one of these four institutions had used the "Yearbook” as a 
textbook. These results would seem to show that the courses now 
given on intelligence testing are much more alike than we might have 
supposed. 

To get some light on the validity of these tests, their scores were 
correlated in the Chicago Normal College with average scholarship 
for the previous semester. The true-false typo, in an earlier and pre¬ 
sumably less perfect edition, correlated at 0.31 (n = 56); and the 
multiple-answer type in the edition sent to summer schools at 0.45 
(n = 30). A more recent revision of the multiple-answer type (one 
that eliminates items shown by the summer returns and by requested 
criticisms to be too easy or too hard or debatable or of minor impor¬ 
tance) correlates with scholorahip at 0,51 (n = 66). These correla¬ 
tions for the multiple-answer type ore as close as those in this college 
between any two studies. Observation through several semesters 
has also shown that students who have failed in some other course are 
usually placed by this test in the lower quintile, and that occasional 
graduate students in a class of sophomores are placed in the upper 
10 per cent. 
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A true-false test of 90 items, thoi^h appai’ently satisfactory for 
rating a class as a whole, is probably too much affected by guessing 
as Dr. Hahn has recently pointed out^ to be of much value in rating 
individuals. If this test is similarly revised it will therefore be 
greatly lengthened. 

In the revised form of the multiple-answer test guessing has been 
minimized by supplying four or more ways of completing each state¬ 
ment, and no statement has been retained which does not seem to 
contribute some item of importance. The field covered is not now 
limited by the distribution of emphasis in the Yearbook’' but is in 
intent determined by considerations as to what the minimum essentials 
are in the field of group inteEigence toting. Giving the revised form 
to one class before it has taken the older test, and to another class 
after it has taken the older test, shows that the norma of last summer 
apply to the revision without change. 

Of course much better tests than these could be made by a com¬ 
mittee drawing material from the objective examinations now used by 
the numerous persons teaching this subject. Would it be worth 
while to have such a committee created? 

^Hahn, H. H.; Critte'ffni of Teste fiequiriag Alternate Reeponses. Journal 
of Educational Research, Vol. VI, October, 1922, pp. 236. 
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NOTE UPON HOLZINGER^S FORMULA POR THE 
PROBABLE ERROR 

TRTJMANKKEIiEY 
Stanford Univor^ity 


In Dr. Holzinger’a article, An Analyse of the Errors in Mental 
Measurement, appearing in this journal, May, 1923, occurs anewform- 
ula for the probable error of the moan: 


PEm = 


.6746(rs'\/2 — Tn 

VN 


( 6 ) 


One important point seems to have been overlooked in the derivation 
of this formula. In particular the use, in this connection, of the 
equation PEa+5 = VP*E.a® + P.E.t’* (found near the foot of p. 284) 
is not warranted. 

Using the same notation as Hokinger, we have; Xi * X' -f* h 
in which Xi is the obtained score, X* tho individual's true ability and 
h his ‘'response error,” The individual's true ability, X^, may beset 
equal to Mi x\ the mean for the group plus the true deviation of the 
individual from the group mean. If w© possessed true scores for all, 
the members of the group could correctly be called the error of 
sampling, i,e. the taking of an individual true score as evidence of the 
mean score would involve an error of this magnitude, x.^ It is in this 
sense that Holzinger, according to my interpretation, has defined the 
“error of sampling. ” However, h© considered the standard deviation 
of the error of sampling when thus defined to be (r*i but this is not tho 
case when fallible measures are used. 

Xi = Mi + + 5i, or Xi Afi = a:' -f 6i 

If we let Xi = Xi — Mi this may be written, 

Xi - x' dj 

and if x' and 5i are uncorrelated we have, 

Thus is not equal to the square of the standard error of sampling, as 
the term has been defined by Holzinger, but is equal to the sum of two 
things, Cs^t and crj,i. Since has involved in it this quantity 

376 
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8hould not be added a second time as it has been done to obtain 
equation (6). 

"Sampling error’’ in terms of its derivation and aa regularly treated 
etatistically means more than the error involved in taking a single true 
score as indication of the mean score. We should use some such expres¬ 
sion as ‘ 'the deviation of the individual in true ability from the mean ” 
and not “sampling error” to designate that part of the total deviation 
from the mean which is not represented by the "response error.” 


REPLY TO PROFESSOR KELLEY’S CRITICISM 

KARL J. HOLZINGBR 
■University of Chicago 

I wish to thank Professor Kelley for his critical scrutiny of formula 
(6). It seems to me, however, that if the formula fails it is due rather 
to certain assumptions in the proof than to the erroneous addition of 
errors as he suggests. To fix ideas, I shall use his notation and 
equations as far as possible. Let 

2/1 = i 5 

where a:; = Yi - Mi, 8 ~ Zi - and - Mi = “deviation 

of the individual in true ability from the mean.” Now if we write 
a;i s= :p 5 and assume xi and 5 uncon'elated, it is evident that 

ffx* = <r** + cji 

But fft» ~ — rjs) 

by equation (3) in my article. Therefore, 

ff*,! “ <rx,i(2 — Tis) (1) 

and formula (6) follows at once, since ffXi = <tx. Thus Professor 
Kelley's reasoning may lead him to the formula to whichhe is objecting. 

Suppose, however, that we write «! — ± S and assume that 

and S are unoorrelated. It then follows that 

= <r»,i + Cat 

or, ffx,, — — ffjj 

Bnt, (Tgt = (r,,i(l — ris) 

as cited above. Therefore, 

fr»i® — ffiVia 


( 2 ) 
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Kelley* has worked out a lengthy proof of formula (2) and expressed 
it in the notation, ffa - but the formula is wrongly ascribed to him 
by Thorndike and others inasmuch as it is identical with formula (5) 
on page 213 in Yule. It is, as Yule points out, merely a part of 
Spearman’s formula for the correction of the correlation-coefficient. 
That such careful workers os Thorndike and Kelley should have 
overlooked Spearman’s formula is very regrettable. 

Returning to the main argument, it appears that the two assump- 
tiona cited above lead to widely different results, and if formula (1) 
is at fault it is because of a wrong assumption i.e. that Si and 5 are 
uDcorrelated. As a choice between the two assumptions it would 
seem more reasonable to expect x*- and 6 to be uncorielated than 
Xi and 5, inasmuch as 3 is a part of xu Furthermore, I myself found 
some evidence of correlation between xi and 5 (page 283 of article 
under diacussion). I therefore suggest that formula (1) be held in 
abeyance until further evidence is at hand. I do not recommend the 
use of formula (2) as a substitute because, even though more reasonable 
then (1) assumptions as to the eorrelation of errors are still not 
clearly justified,^ Much more careful inquiry is needed in the subject, 
and if this discussion in any way prompts such study it may not be 
entirely amiss. 


Editor The Journal of Educational I^yohology, 

8ir: 

I was amazed at the sort of review my ‘'Behaviorism and Psychology" received 
at the hands of Mr. A. 1. G. whom, did the initials not give mo some clue as to 
his Identity, I might have taken to be one of those elementary students who 
"will of course find such abook unintelligible,” Mr. A. I. G. implies that my work 
is not thorough because it is “based munly on a series of oontroveraial articles and 
afew books” (tiiz., over 200). Does the reviewer think I might have made use oi 
books and articles that have not yet been vniUen in order to do justice to the behav- 
iorists? Ah, yes, I have been stupid enough to discuss books and articles instead 
of “fundamental trends,” and what is more Isold nothing about Judd's writings on 
motor attitudes. True enough, nor have I mentioned the doctrine of transub- 
stantiation or the Ptolemaic system of astronomy. Goodness knows why I must 
consider in a critique of behaviorism all the psychologists thot Mr. A. I. G. can 
think of. And why, pray, must I dwell at length on " such on extensive theory as 
those (that?) of Washburn in Movement and Mental Itnager}}'’ when I have referred 


‘ Kelley, Truman; The Reliability of Test Scores. Journal of Educaiiortal 
Research, May, 1921, pp. 379. 

* Brown and Thomson: “Essentials of Mental Measurement.” pp. 160. 
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to pfisaagcs in this very book as well os to several articles, in addition, by the same 
author to show that she is decidedly hostile to behaviorism? 

The reviewer’s statements and claims are oatonishingly flimsy. What does he 
mean by saying "In treating the 'Varieties of Behaviorism' and other topics, we 
are given (he gives us?) a scries of quotations some of which, may have been 
unwittingly (sic) made ...” when I have done my beat to select the most 
characteristio passages in every case; and what else could I have done than to 
content myself with very brief expositions of the various behavioristic writers, if 
the book was not to become so, bulky as to prevent its publication? In order to 
represent the movement of behaviorism fairly, I cited and characterized the views 
of about 60 behavioristic writers of various shades and tints. Had I devoted only 
two pages on the average to each writer, the chapter on varieties of behaviorism 
alone would have exceeded 100 pages. 

Perhaps liad the reviewer been less averse to logical treatment and systematic 
development of a subject, he would not have taken exception to my examining the 
logical setting of behaviorism nor would he have regarded aa irrelevant the appen¬ 
dix "How is Psychology to Be Defined?" He certainly would not have seen in 
the book any discussion of the "incompatibility of behaviorism and philosophy," 
nor would he have demanded "a comprehensive view of movements within the 
soience itself" (what science?) Mr. A. I. G. does not seem to realize that my 
task was to examine the behavioristic movement and its relation to psychology, 
and not to delve into the foundations.of psychology. 

To conclude, a brief review does not necessarily have to be a perfunotory and 
carelessly written notice. A serious approach on the part of the reviewer is essential 
In every ease whether the review be sympathetic or adverse. 


Harvard University 


A. A. Bobaoe, 
May 8, 1923. 



NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 
EDUCATION 


CONDUCTED BY LAURA ZIRBE.S‘ 

Economy in Secondary Education .—Certain experimental activities 
carried on during recent years in the laboratory high school of the 
TJnivereity of Chicago have been described by members of the faculty 
in a set of reports of unusual interest and suggestiveness. These 
reports* deal with four major lines of experimentation seeking respec¬ 
tively; (1) "Better and more economical material," (2) "Better and 
more effective technique of teaching," (3) "individual personnel 
study," and (4) "more effective institutional organization.” 

Outlines of a two-year sequenoe in history, now constituting 
practically the whole offering in that subject in the school, are pre¬ 
sented and discussed by H. C. Hill and A. F. Barnard. The fii'st of 
these courses, elective for sophomores, is a Survey of Civilization which 
stresses community life in different periods from primitive man down, 
but includes relevant narrative as well. It is a venturesome depar¬ 
ture from the traditional in high school, but, when the strict limitation 
of time for history is considered it appears to be remarkably well 
designed to give the perspective and the baokgi’ound needed for an 
intelligent study of modern events and conditions. The second 
course is one in Modern History covering "the chief movements and 
most significant features in the history of the United States and Europe 
since the middle of the eighteenth century,”’ Both courses are divided 
into seven or eight units of work, a mastery of the minimum essentials 
of each being required before credit is given. Reference books found 
to be particularly helpful in the course arc referred to in the discussions, 

The effort to improve the technique of teaching in the school has 
resulted in the adoption of a procedure calculated to 10801X5 pupil 
mastery of units of work, to take account of individual differences, 


I “ Studies in Seoondaty Education. I." UnivCTBity High School, University 
of Ohioago. Supplementary Educational Monographs, No. 24. Chicago Departr 
meat of Education, University oi Chicago, W23. 
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and to develop halDits of effective independent study. Credit for this 
procedure is given several times in the monograph to Professor Henry C. 
Morrison, Director of Laboratory School. Its application in history 
is clearly described in the report of Hill and Barnard, in Elementary 
Science by W. L. Beauchamp; and in English Literature by Ernest E. 
Hanes and Martha J, McCoy. This procedure, Herbartian in descent, 
calls for five explicit steps in instruetion; exploration, a preliminary 
probing for existing I'clevant knowledge; presentation, a prospectus 
from the teacher of the unit of work to be covered; assimilation, the 
real work of study; written organization or recitation; and oral recita¬ 
tion. Of these five, assimilation is regarded as muoh the most impor¬ 
tant and occupies from one-half to three-fourths of the time devoted 
to a unit- During assimilation the class may work for weeks without a 
recitation as commonly understood. In fact, throughout the procedure 
there is veay little to remind one of the question and answer recitation 
to which most of us are still addicted. Emphasis is laid upon, each pupil’s 
mastering all minimum essentials for himself. The testing of his 
accomplishment is searching, and failure to show mastery means 
re-study and re-testing. We have here no soft or sugar-coated pedagogy. 

In a report of a controlled experiment to determine the effect of 
teaching certain methods of study in Elementary Science, Beau¬ 
champ presents data tending to show the very real effect of specific 
attention to this sort of training. Pupils so tj'ained do develop in a 
short time better comprehension, better organization, and greater 
power to apply what they read. The final report in the book is by 
Hanes and McCoy, on the teaching of a unit in English Literature 
(the essay) to classes of boys. It gives one the hope that high school 
instruction in English Literature may some day elude the cynics and 
jokesters by actually producing pupil enjoyment and the voluntary 
reading of literary material. Informed and inductive are the adjectives 
these instructors justly apply to the method which they describe. 

Individual personnel study is regarded by W. C. Reavis, Principal, 
and Elsie M. Smithies, Assistant Prinoipal, as one of their most 
important functions. They present typical case studies and some 
striking data from the results of this kind of work. The completeness 
of their investigations of problem pupils is beyond the reach of most 
public high school principals, One feels, however, after reading this 
matter on constructive student-accounting that we could very well 
be doing much more than we ordinarily do to save high school pupils 
from scholastic disaster. 
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In the introductory chapter of the monograph Professor Morrison 
tells what has been clone by way of institutional reorganization to 
effect an economy in time. The elimination of the eighth grade some 
years ago has been followed by the grouping of the seventh grade 
with the high school proper, making a five-year high school. In 
addition to this, by such economies in subject material as Hill and 
Breslich report, it has been made possible for capable students to 
secure more or less junior college credit by the time of graduation. It 
seems possible that the whole period of high school and elementary 
education in these laboratory schools may from now on be shortened 
to 10 years or the equivalent. The possibility of such a saving becomes 
every year more significant to those who are struggling to obtain 
support for public education. 

This whole monograph contains such substantial material, is so 
restrained in its conclusions, and is so provocative of thought that 
anything like a comprehensive review of it is impossible. It deserves 
careful reading by everyone engaged in secondary education. After 
reading it one should feel that conditions in this experimental school 
with its relative abundance of equipment, excellence of teaching force, 
and selected student body remove its procedures from possible dupli¬ 
cation in public, financially restricted high schools, but rather that 
the results in such an institution may in time come to be appreciated 
by patrons of our public schools enough to induce them to provide 
what is needed for the similar education of the children of their 
own communities. 

Springfield, Illinois. M. H. Willing. 


2. Industrial Needs and Present Educational LimiiationsJ —Again 
Dr. Link has produced a book worth reading. Like his “Employment 
Psychology,” hia “Education and Industry” not only represents the 
most significant factors in the field so far developed, but also suggests 
new avenues for progress. Fortunately in Dr. Link we have that rare 
combination of a professionally-trained teacher and industrially- 
trained executive—that combination so essential to the solution of 
the'problems of education in industry. 

Primarily the book aims to propose a plan for the realization of the 
much needed good will between the employee and the employer; 

'Liok, Henry 0.: "Education and Industry," The Macmillan GoTopany, 
New York, 1923, pp. XV + 265. 
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and, secondly, to lead the way in showing how to train workers in 
the various endeavors which are so vital to industry. To accomplish 
the first end the author thinks general education of the worker defined 
in the broadest sense possible is essential; while vestibule schools, 
trade schools of the most modern nature for the executives, foremen, 
department heads, salesmen, etc,, are necessary for the realization of 
the second ideal. The community as a whole as well as the individual 
industries must cooperate to accomplish this double purpose. And 
this is only fair since the community as well as the industry benefits 
from an improved education. 

The thoroughness of the book is such that the reader is tempted to 
wonder at the omission of certain matter of interest which of course 
could hardly have been unknown to Dr. Link—such as the important 
source of education offered by the workers themselves—the Labor 
University, in the city of B oston, the Rand school, and Labor Temple, in 
Kew York City, educational endeavors of the American Federation of 
Labor, and the Amalgamated Clothing Workers of America, etc., etc, 
While it is true, that of the educational facilities supplied by the 
organizations of laborers, a great deal is propaganda, nevertheless, 
much is not. Just what Dr. Link has done to separate the good from 
the bad in the educational methods fostered by the employer, he 
might have done for the educational facilities supplied by labor, and 
we regret his not having done this, because he would certainly have 
done it well. 

The book as a whole, I am sure, cannot be ignored by any student 
of industrial education and the specialist in the general field of 
education will find Dr. Link’s suggestions greatly Stimulating. 

Northwestern University- A. J. Snow. 


3. A Summary of Educational PsychologyA —Because of the tre¬ 
mendous strides which have been made in the last few decades in the 
science of Educational Psychology, the writer of a textbook in the 
subject is faced with a serious dilemma. If he treats each topic with 
the detail it deserves, his book is in danger of assuming the proportions 
of an encyclopedia; if he attempts to summarize, he is in danger of 
writing a treatise which is on the one hand too technical for the 

* Mead, A. R.t “Learning and Toaeshiog,” Lippincott Publishing Co., 1923 
pp. 277, 
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beginner and On the other, too sketchy for the advanced student. 
The writer of “Learning and Teaching” has fallen into the latter error. 
The book attempts to cover every topic of importance which would 
be included in his title; the result is a serious curtailment of detail 
on important subjects. For example, in the summary of a section 
of a chapter on page 16, he indicates that he has covered the topic, 
“the relation between drill and habit formation.” This important 
topic is treated in but six lines on page 16. The entii'e subject of 
“Original Natiu'e and Education” is handled in eight pages of text. 
The topic of fatigue is left out of the text entirely, but a series of 
difficult questions on the subject is included with references to Thorn¬ 
dike, Starch and others. Possibly it would have been wiser to have 
said nothing about the topic if it could not be treated fairly adequately 
in the text. 

The questions, exercises and experiments af the end of each chapter 
seem to the writer to be excellent material for psychology classes. 
•This material is unusually voluminous, and in assigning subjects for 
study, and class and group discussions the psychology teacher will 
find it of great help. 

Edwin H. Reeder. 

Teachers College, 

N. Y. C. 
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A STUDY OF THE DOWNEY TEST BY THE 
METHOD OF ESTIMATES 

NORMAN G. MEIER 
Sta.te University of Iowa 

The growing interest which has been evinced in the Downey Will- 
Profile^ (Will-Tcmporament Test) since its publication in 1919 is per¬ 
haps attributable to the current interest in tests which may serve to 
supplement the measures of general intelligence, and to the fact that it 
is the most systematic attempt yet made in the appraisal of temper¬ 
amental qualities. This field of volitional and temperamental 
measurement presents complex and difficult problems, so that it seems 
somewhat unlikely that any test will ever sample the variables which 
constitute' temperament, with a high degiee of certainty. The study 
reported herein^ is an attempt to ascertain the degree of reliability 
which this promising test possesses. 

A consideration of the previous studies of the test® gives a variety 
of conclusions concerning the value of the test, and reveals certain 
imitations and shortcomings in procedure, unforeseen at the time, 
from which this study has profited. Space limitations forbid a critical 
review, but the points of departure will be found in the context. 


Experimental Procedure 

Two procedures may be followed to secure data upon the reliability 
of the test. The first consists of giving other objective tests of known 
validity that would provide measures of traits similar to or identical 

' Downey, Juno E.: “Tho 'Will-Profilo, A Tentative Scale for tbe Measuiement 
of the Volitional Pattern," Psychology Bidlelin. No. 3, University of Wyoming. 

* At the University of Chicago, under the direction of H. A. Carr and F. N. 
Freeman. 

•See Bibliography, under Bryant, Clark, Downey, and Rueb. 
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with those in the Downey scale. The degree of agreement of the two 
sets of tests would be the basis for drawing the inference of validity. 
Certain difficulties, however, render this procedure impracticable: 
chiefly, that the time requirements for each person to undergo all the 
necessary tests would be prohibitive, and that a sufficient number of 
such tests are not yet available- 

The second procedure, followed in this study, consists in first 
administering the test to a comparatively large number of subjects, 
and then securing for comparison purposes three independent estimates 
from persons who know the subjects well. This method admits of the 
possibility of uncertain conclusion: that is, in the absence of clues to 
the contrary, it may be impossible to ascertain whether greater validity 
is possessed by the teat-scores or the estimates. 

Because of its obvious significance for vocational and educational 
guidance and as its application would be peculiarly appropriate to 
adolescents, it was decided to use high school students for subjects, 
Through the cooperation of the Superintendent of the Laboratory 
Schools and the Principal, the test was given in individual form to 
106 students in the University High School. Of these, 64 took the test 
in group form later. This high school draws its attendance from the 
city of Chicago and environs; about 10 per cent come from University 
faculty homes, the remainder from diverse situations in life. Of the 
number tested, 62 were boys; 64 girls. The median age was 16, the 
distribution ranging from 13 to 20. The median mental age was (94 
cases) 17 years, 3 months, and the median IQ 112.0.^ In scholastic 
position they ranged as follows: 

Post -graduate.... i. 1 

Seniors. 27 25.5 per cent 

Juniors. 37 35 per cent 

Sophomores. 19 18 per cent 

Freshmen. 17 16 per cent 

Sub-freshmen. .. 5 4.7 per cent 

The names of subjects were drawn, with few exceptions, at randcm 
from the sohool files by the office assistant. The subjects were in 
nearly all cases quite naive as regards this test—a condition which rules 
out the disturbing elements of anticipation, of the tendency to analyze 

' The Terman Group Intelligence Teat waa given by Mr. Wm. C. Renvis, 
Principal of the U, H. S, The IQ was obtained from the point score by tables. 
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purposes as one proceeds, and of ovei'-zealousness to do the right thing 
while missing the real import of given directions. 

Other conditions were maintained with the view toward having all 
circumstances favorable for best results. Distractions were reduced to 
a minimum. The administering of the individual form was clone after 
the experimenter had given the test to 24 other subjects, so that 
he was neither unfamiliar with the test nor inexpert in giving it. The 
subject was kept in ignorance of the pm-pose of the test. Time was 
kept by a laboratory stop-watch, reading to one-fifth second. 

The names of persons who were to supply estimates of the traits 
were secured from the subject. These raters were (a) the teacher who 
knew the subject best; (6) a parent; and (c) some friend (not a teacher) 
over 18, who had known the subject for at least two years. In this 
third class were included older friends of the family, scout masters, 
Sunday-school teachers, relatives not in the same household, and 
older chums. 


Method of Secuhing Ratings 

To find a method for securing reliable ratings proved to be a diffi¬ 
cult problem. A searching of the literature failed to reveal any 
extended descriptions of the traits, not excepting the writings of Dr. 
Downey herself. The theoretical discussion in the original bulletin^ 
and the journal article^ lack a complete and definite statement of 
just what the traits mean. A further objection to their use may be 
raised in that they are couched in psychological terms, the meaning of 
which is not agreed upon even among persons in the same psycholog¬ 
ical laboratory. Discussions with persons not psychological! y trained 
indicated a much greater obscurity in the meaning of these terms, 
particularly "motor impulsion,” “freedom from load,” and “voli¬ 
tional perseveration;” hence the obvious necessity of first putting 
these expressions into non-technical language. 

It is essential that Rater X have the same or approximately the 
same idea of each trait as Rater Y if comparable data are to be 
obtained; hence an attempt was made to formulate as complete and 
understandable a statement of the trait’s meaning, as possible. The 
material consulted comprised the articles referred to, the revised 

’ The Will-Profile, op. cit. 

* Downey, June E.: Some Volitional Patterns Revealed by the Will-Profile, 
Journal Experimented Psychology, Vol. Ill (1920), pp. 281-301. 
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Manual,Uecture notes,® and the treatment of some of the trnitsin texts.’ 
When drawn up they were presented to competent oritics for exam¬ 
ination and suggestions.^ When the form had undergone four 
revisions it was sent to Dr. Downey who after minor criticisms, 
approved it.® 

In constructing a satisfactory rating device to embody the three 
conditions laid down by Rugg* it was decided after some experimenta¬ 
tion to adopt a modification of the man-to-man comparison type of 
scale. Instead of requiring the rater to construct twelve master- 
scales (one for each trait), he was asked to recall all the friends and 
acquaintances he could who are fast (or slow)—^if the matter involves 
speed of reaction—in this particular trait, and to place the person 
rated, appropriately, in comparison with them. The judgments were 
to be registered on a graphic rating scale, which would encourage 
greater freedom in use of the entire range rather than the middle 
points predominately, and which provides finer numerical uoits. 
The preliminary instructions were designed to free the rator from all 
presuppositions regarding the nature of the ratings desired; further- 

* Manual of Directions, Downoy WUUTemperament Test. Yonkers, N. Y,, 
World Book Co., 1921, 

* Course, by Professor Downey, University of Chicego, Summer, 1620. 

Motor impulsion and inhibition, in James, Psychology, Chap, 21, N. Y,, 
Holt, 1890. Other traits are treated in Ribot, Psychology of tho Emotions, Chap. 
7, N. Y., Scribners, 1898, 

* Tho writer wishes to acknowledge hisgrcatindebtedness to Pmfessors Carr and 
Freeman, whose searching oriticismsprovided stimulus for five revisions. Valuablo 
suggestions were supplied by Professore Kingsbury and Robinson and by Mr. 
Kornhauser. 

® A description of the form is impossible hero because of space limitations but 
the following description of tho first trait mny be taken as a sample. 

1. Sfeed. of Movmml. 

Usual rate of movement (relative to his size and age). 

How does he move about naturally, normally, habitually ? Consider the 
rate after he gets started, and when not impelled by any particular ui'go, 

Does ho walk quickly—or slowly? How does he talk in usual, normal 
speech? Write? How fast, or how slow, are his movements in shuffling and 
dealing cards; in senliug and stamping nn envolope? Not how fast lie can 
walk, talk, write, seal an envelope, if under stress, but how faster how slow does 
he when at his usual duties? 

slow--^last 

average 

*Is tho Rating of Humon Character Procticable? Journal of Educational 
Psychology, Vol. XII (1931), pp. 426-38. 
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more, to bait hia interest, and to convey some idea of what would be 
expected in a normal distribution of degrees of traits. 

' The forms were sent by mail with stamped envelope for return. 

The Data 

Of 106 forma sent out to each group, 95 per cent were returned 
from the teachers, 86 per cent from the parents, and 73 per cent of 
the friends. Of these, 64 per cent were common to raters in the three 
classes, which after discarding four because- of incomplete scores, left 
64 or 60 per cent, perfect and complete. These were used for most 
of the computations which follow. 


I. Corrolation of tost scores with tho esthnates of the three sets of judges— 
pooled. Trait by trait. 



II. Corrolation of test seoros with the estimates of the three'sets of judges— 
separately. Trait by trait, 
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estimates,.. 

.11 

.01 

.07 

.10 


.13 

.06 

.00' 

.00 

.22 

,05 

.10 

.0542 


±.oa 

±.08 

±.08 
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HR! 

±.08 

±.08 
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±,08 

±.08 


±.08 

Friends' 









■1 



M 


estimates... 

.13 

.00 

.30 



.07 

.02 

,01 


.03 

.17 

■ 

.0007 


The Spearmen Rank-Difference Formula was used for these computations. 
Use was made of the Scott Company's "Tables to Facilitate the Computation of 
CoefHcienta of Correlation by the Rank-Difference Method," which follows the 
formula 
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III. Correlation of estimates with estimates of tho several judges. Trait by 
trait. 



B 

“3 

ta 

Freedom from 
load 

Flexibility 


g 

o. 
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‘S 
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S 
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o 
o 'S 

•P o 
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* a 

9 

I'l 

|l 

Finality of 
judgment 

§ 

■s 

in 

1 

3 

*3 

s 

Interest in detiail 

Coordination of 
impulses 

a 

o 

U 

0 

u 

4 

1 
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±.08 
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±.08 
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.34 
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.33 

,17 

.1425 
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±.08 


±.08 
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±.08 
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±.08 

±.08 

irienda...... 


.00 


.01 
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.04 

i 



,28 

-,02 

.0702 
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±.os 

±.06 
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±.0S 

±.oa 

±.07 

±.08 

±.os 
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frienda. 
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.08 

.16 


.24 

.33 

.10 

,80 




(Rank-DifTeroneo Method) 


IV. Correlation of tho indiriduol form with tho group form. Trait by trait. 



±.06 
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41 

±.08 

±.08 

HR 

±.08 
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±.00 
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±.08 

±.08 

Individual 





■1 


■1 
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group. 

.67 

.63 

-.01 

.18 


-.22 

■ 


H 

.48 

.04 

.20 

.2230 


tProituci-Moment FoTiivn)&) 


V. Correlation of Downey test total—scores with point scores of Terman 
group intelligence teat. 

p * +.21 (Rank-Difforenoo Method) 

VI. Correlation of total-scores from individual forms with total scores from 
group form, 

p = +.60 (Rank-Differenco Method) 

The measure of agreement between the pooled estimates and the 
test scores is, in the viewpoint of Rugg,^ of most significance. These 
correlations appear to be consistently low or negligible. Correlation 
is present but low in certain traits, namely speed of movement, flexi¬ 
bility, speed of decision, motor inhibition, and interest in detail. 
In the other traits it is indifferent or negligible. 

From these gross results it may be concluded that disagreement 
exists, but nothing more. To convict the test without inquiring into 
the character of the witness would be a misuse of statistical method. 


‘ “l3 the Rating of Human Chniaotor Practicable?'' op. cit. 
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Analysis op the Data 

The explanation of the disagreement may lie in any one of three 
directions. There are these possibilities: 

(a) That the test is inadequate or defeotivc as a measure of these 
traits; 

(b) That the estimates are unreliable; 

(c) That both test scores and estimates are expressions of some¬ 
thing, but of different things. 

Considering these in turn, it may be said that (a) is prima facie a 
reasonable and natural interpretation. Certain facts, however, cannot 
be passed by without comment. The first is that, if this hypothesis 
is the correct one, some evidence would be afforded by repeating the 
test with the same subjects. This was not done, but the giving of the 
group form two to four months later constitutes a condition approxi¬ 
mating repetition, since a number of the traits are tested in similar 
ways in both forms. High correlation appears in five traits (those 
objectively secured and scored) while in the other traits negligible 
or inverse correlations are found. This indicates some degree of 
confirmation in the instance of the five traits, but from another view¬ 
point it casts serious doubt on the worth of the group form. The 
second point is that the nature of the problem is such that usual limits 
for high and low correlation could not, in perfect fairnesB, be applied. 
The error-possibilities are exceptionally manifest. To enumerate a 
few there are first the errors of interpretation: varying interpretations 
of the printed descriptions of the traits; errors due to a common bias 
on the part of all observers; those due to observers taking different 
points of view, and random errors, including guesaea. The errors of 
recording also operate to limit the correlation possibilities; these arise 
because of individual differences in the process of registering the 
judgment, once made. Since the rater is using for his yardstick a 
group of acquaintances, his judgment will be the same as that of 
another judge’s only in the instance that the two have identical groups 
of friends, which rarely if ever is the case. The circle of one judge^s 
friends may embrace fewer thon one hundred persons; an other’s may 
include ten times that number. Persons differ, furthermore, in the 
accuracy of their observations and in the keenness of their estimations 
of ability. This condition offers no excuse for lack of consistency in 
the test, but rather suggests the inherent difficulty of this type of 
research—whether such estimations of human qualities are practicable 
at all. 
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Plate I. 
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Plate II. 
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(h) That the Estimates Are Unrdiahle. —With the expectation that 
the plotting of distribution curves would throw some light upon the 
disparity between scores and estimates, curves were made for each 
trait, for each of the three series of estimates. (See Plates I and II.) 
Normally the test scores would be expected to follow a rectangular 
distribution, since that distiibution was adopted by Dr. Downey in 
constructing the norms. This condition was found to obtain in 
traits 1, 8, 9, 11 and roughly in 2.- Trait 6 was concentrated about 
the score 4; 9 about 5; 10 and 11 about 8 and 9 respectively. 

An examination of the distribution of the estimates reveals other 
significant features. Of the 36 curves representing distributions of 
estimates, only 12 show any resemblance to a normal probability 
curve*. 14, 1-p, 3-/, fi-p, 5-/, 6-p, 74, 8-t, 9-£, 10-£, 10-/, 114, and 
12-/. Twenty-one of the total exhibit an undue and disproportionate 
concentration about the score of 5. Seventeen show a similar concen- 
tration about the scores of 9 and 10. Four instances occur of a 
concentration about the scores of 7 and 8, or the three-quarters point. 

The "bunching” of scores about the half-way point in so many 
instances (and at the end and three-quarters position in other in¬ 
stances) strongly indicates that in more than half the cases the raien 
were unable to come to any certain condusion and hence had recourse to 
the conservative "average” rating.; and in the cases of higher ratings, 
were likewise unable to judge accurately and rated high because the 
bias worked toward giving a favorable return. 

With such observations in mind it becomes extremely probable 
that the unreliability of the ratings is very considerable, and that 
therefore the fact of low correlation may bo traced to this as one of the 
sources. Just what this unreliability has to do with making the 
correlations as low as they are it is impossible to determine. The low 
correlations among raters adds weight to the supposition of unreli- 
abilityj'fiu'thermore, the one instance of high correlation (parents with 
friends) is explainable by the act that in numbers of cases the returns 
are practically identical, indicating that the judgments were made in 
conference (this is known to be the case in several instances). 

(c) Thai Both Test-scores and Esiimaies are Expressions oj Something 
hut the Two Values Are Different Things. —It is conceivable that the 
test for speed of decision does not actually measure speed but rather 
something allied yet different, as posably care in decision, or honesty 
in decision; while the rater was judging with quite a different criterion, 
still he may have emphasized ease in coming to decision or some- 
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thing else, In any such instance no considerable correlation could 
be expected, In this possibility there is no practicable way of checking 
back to investigate the actual condition. 

Conclusion 

In summarizing these results it may be said that: 

1. Estimates of traits judged from fairly explicit descriptions of 
them show indifferent or negligible correlation with scores earned in the 
Downey Test, individual form. 

2. This absence of high positive coiTelation is not to be taken as 
absolute evidence against the validity of the test, for the reason that 
there are indications that the estimates were in many cases unreliable; 
but rather as grounds for questioning the value of the test for certain 
of the traits. 

3. From the point of view of practical utility, these results may be 
taken to indicate that, in its present state of development, the test is 
considerably imperfect because the traits it purports to measure are 
not such that people can readily understand and identify. 
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THE JOINT YIELD EROM TEAMS OE TESTS 

CLARK L. HULL 

University of Wisconsin, Madison, Wisconsin 

In the early days of mental testing, even where several tests were 
combined for a single determination, little attention was paid to the 
correlations among the tests themselves, although in a few cases there 
seems actually to have been a conscious attempt to have the tests 
correlate as highly among themselves as possible. At the present 
time, however, it is generally recognized by psychologists that for the 
tests of a team or battery to yield the maximum prediction, there must 
be not only as high a correlation as possible between each test and the 
criterion, but at the same time as low a correlation as possible among 
the individual tests. 

This fimdamental contrast accordingly divides the correlation 
coefficients involved in a team of tests, into two distinct groups. The 
division in question is shown systomatically in Table I. Here 
appear tlie correlation coefficients nectary for a regression equation 
involving five tests. In the subscripts, I signifies the oriberion, the 
Arabic numerals designating the various tests. The coefficients 
appearing in the first column should accordingly be as high as possible 
while those in the remaining columns should bo as low as possible. If, 
for purposes of analysis, we assume nil the coefficients in the 
fimt column as equal, we may represent them without distinction byr'; 
and if for similar reasons we assume all of the remaining coeffioiente as 
equal, we may represent them uniformly by r". 


TAnr.B I 


To be as high as poasibic 

To be as low as possible (r") 

Tn 





f\i 

Tit 




ri3 

ri3 

rsj 



Tn 1 

rn 


Tn 


nj 

Tib 

ra# 

Vit 

Tti 


Now if the special r values assumed above be substituted 
appropriately in the formulas for partial correlation and the results 
simplified in each case, the following expressions are obtained: 

306 
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rii.2 “ 


1 - r'2\/l - 


rn.3 — 


1.23 - sj- 


,.'2(1 _ //) 

(1 + r" - 2r'^Kl + VO 


ri3.34 - 1 _|_ 2r" 

I r'Hl - r"^) 

rii.334 ^(1+3,." _ 3,./2 _ 3,,V2 + V'2)(l + Sr'O 


These values being substituted appropriately in the general formula for 
multiple correlation, 

Iii(12....n) = \l - (1 - r’‘n)(l^ r^i 2 -i). ■. ■(! - . [«-j)) 

we obtain by successive application, the following: 

For two tests, R —r' 


For three tests, R =r' ^ 

For four tests, R -^r' ■^' i 

The succeeding members of the series may now be written by analogy 
thus: 


For five tests, R 


''yll + 4r" 


For six tests, R ~ '\|' i Jj. ^p 
Or, more generally: 


For n tests, 


22 = r' 


+ 1 ) 


From (1) is obtained directly: 

222(1 - r'O 

- r/2 _ /'iesj 

, =« 

r' %'— 22^ 

. ^ " 222(n - 1) 


( 2 ) 

(3) 
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With the aid of equations (1) to (4), we may now proceed to the 
quantitative analysis of the relation of the four basic factors involved 
in predictions from teams of tests. The first matter to be considered is 
the amount contributed to the yield (B) of a team of tests, by the addi¬ 
tion of each conetituent test. To be specific; if we assume that r' = 
,40 and r" = .20, how much does the addition of a second test to the 
first of a team, increase the yield over that given by the first alone? 
How much does a third test add to the yield of the first two? How 
much does the fifth or the seventh test add? How much will 60 or 100 
tests yield over five or six? By substituting successively in equation 
(1) we obtain the values in Table II, which gives the yield of varying 
numbers of tests of the potency assumed above. 

Perhaps the most striking fact revealed by Table II is the marked 
tendency to diminishing returns met with as successive tests are added. 
This tendency is shown even moro clearly by Table III. 

Tablb II 


Showing the yield of varying numbers of tesla where r' = .40 and r" «= .20. 


2 teats, R ^ .616 

3 testa, = . 684 

4 tests, R == . 632 
6 tests, == . 667 
6 teats, R =» . 693 


60 teats, R = .8696, 

61 tests, R .8816 

100 testa, R = .8788 

101 tests, R as , 8772 
1000 testa, R = ,8026 


Tablb hi 


Showing Ote amount added to the yidd (iS) by each succeeding test of the scries 
considered in Table II. 


No. 1, .40 
No. 2, .116 
No. 3, .068 
No. 4, .048 


No. 6, .036 
No. 6, .026 
No. 61, .002 
No. 101, .0004 


A somewhat more comprehensive view of the variability in yield of 
teams of tests as dependent upon the values of n and r" is shown in 
Pig. 1. The yield is shown in multiples of r' in order to make them 
comparable and as general as possible. Hero it is seen that at the 
extreme where r" = 1.00, there can be no increase in R from the 
addition of new tests. As t” grows lo^ the possibility of increasing R 
by the addition of tests becomes increasingly greater. In all cases, 
however, except where r" ~ —.10, the curves show a tendency to 
diminishing returns with the inorease in the number of tests. 
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The relatively simple situation revealed by Fig. 1 undergoes inter¬ 
esting complications when r’ is also varied and the yield is shown in 
terms of R. A few typical curves of this second kind are shown in 
Fig. 2. The two pairs of curves originating respectively at .30 and 
.40 may be compared. In the upper curve of each pair = ,00; in 
the lower, r" = .20. The frequent crowing of the various curves 
shows that there are many cases of approximately equivalent yields 
from different r' and r” combinations if n be chosen suitably. For 
example, there is no choice between (r' = .40, r” = .00) and (r' = .50, 
r" = .65) where two testa are used. With a smaller number of tests 
the combination (r' = .60, r" = .65) gives the greater yield but with 
more tests the combination (/ = .40, r" = .00) is decidedly better. 
With five tests the combination (r' = .30, r'' = .00) yields the same as 
(r' = .40, t" = .20), both being better than (r' = .50, r” = .66). 
With more tests, (r' = .30, r" = .00) is decidedly better than either. 
The remarkable upward sweep of the curve (r' « .05, r" = -.10) is 
also to be noted. Speaking roughly, high values of both r' and r" are 
superior to low values of both where only a small number of tests are 
used, but the latter tend to be superior where more tests are available. 

Now the f' values of single aptitude tests for genuine vocations do 
not ordinarily rise much above .40 and they are usually lower. If the 
tests are chosen wisely, r" may be kept down around .20 or so. After 
considering the above examples of the diminishing returns in the yield 
of teams of teats with the increase in the number of tests employed, 
particularly as shown in Table III, we find oursolves questioning 
whether the increase in yield from the addition of a sixth or a seventh 
test of the grade just mentioned, will be worth the extra time and labor 
necessary for the giving and scoring of it. It is quite,evident that on 
the principle of diminishing returns a point is certain to be reached 
sooner or later where the addition of a new test will not pay. It is 
clearly a matter of some importance to the future of vocational psychol¬ 
ogy, whether this point is reached before or after a really satisfactory 
R has been obtained. It is evident that the point at which this critical 
change takes place, is dependent upon a number of factors. As a 
necessary preliminary in the analysis, it is desirable to determine the 
number of tests of the grade assumed, which will be necessary to 
produce an R of satisfactory proportions. Assuming R = .75 as 
about the lower limit for purposes of satisfactory vocational guidance, 
say, the number of tests required may readily be found by substituting 
appropriately in equation (2): ' 
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.75* (1 - .20) 

" .40* - .20 X .76* 

= 9.47 

It must be confessed that the prospect of having to use nine or ten 
tests is not particularly reassuring. It is true, of course, that certain 
batteries of tests now in use include a number of test units approaching 
this, notably Army Alpha which has eight. Such a large number ia 
only possible where the methods of administering and scoring the 
tests ai‘c very economical. Under extremely favorable circumstances 
in this respect it is likely that, if available, a number of tests distinctly 
greater than ten might be profitable. But the availability of such a 
large number of relatively high grade tests which will at the same time 
be economioal in administration, ia a much more serious question. 
As a general thing, the teats which have been economical in administra¬ 
tion so far, have been group tests. But as now conducted, such 
tests seem fated to have an extremely high correlation among them¬ 
selves owing to the large element of pencil-and-paper behavior which 
they have in common, Thus the r^’s of Army Alpha range around 
.60 and those of the National intelligenee tests run even higher. 
Fortunately the r'’s of these tests also run much higher than the .40 
assumed above. But even so, the proportions of curve (r' » .50, 
t” » .65) in Fig. 2, show that the hope of increasing the size of B 
very much by the multiplication of such tests is not well founded when 
the number exceeds three or four. Indeed, it is probable that one or 
more of the less efficient units of each of the well known batteries of 
tests just considered, might bo discarded without appreciably diminish¬ 
ing the predictive yield of the respective teams as a whole. 

An interesting question suggested by Table I, is whether the com¬ 
bination {r’ * .40, r" - .20) would ever produce a perfect R (1.00) 
even with an unlimited number of such tests available. To answer 
this question it will be necessary to cany our mathematical analysis a 
little further. An inspection of the right-hand member of equation 
(2) above, shows that r”R 2 may never become larger than r\, else n 
would become negative, which is impossible. That is to say: 

( 6 ) 

If we further assume R to be at its maximum (1.00) then (6) becomes: 

r ">/2 ( 6 ) 

That is, if R is to reach perfection, then r" can never exceed r' 2 . We 
are accordingly able to state definitely in answer to the above question, 
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that no matter how many tests of the strength (r' = .40, r” == .20) may 
be combined, a perfect prediction can never be obtained, since .20 is 
greater than .40=* which is only .16. 

In cases such as that just considered, where a perfect prediction 
can not be obtained, it becomes a matter of some interest to know what 
the maximum possible yield may be. By referring to equation (2) 
once more, it is apparent upon inspection that n will become infinite if 


Accordingly, 





(7) 

( 8 ) 


But since by inapeotion, R is at its maximum when n = », equation 
(8) will give the limiting yield of any combination of r' and r" values 
in which r"<r'^ Thus in the case of the combination (r' * .40, 
r" = .20) already considered, 

= .8944 


As would be expected, this value differs only very slightly from that 
obtained when n »* 1,000 as shown in Table II. As a second example 
of the application of (6) and (8) we may consider Army Alpha where 
roughly r' = .68 and r" = .60. 

i?’ - 1 ^ 

aJ.60 

= .749 


It is interesting to note that this figure is not much above what was 
obtained by the army psychologists under favorable conditions. Of 
perhaps more significance is the indication that with an indefinitely 
larger number of such tests available, they could not have materially 
bettered the yield actually obtained. 

A special case of some interest is that in which the tests are all 
assumed to be strictly independent of each other, i.e., where r” — 
.00. In this event equations (1), (2) and (3) reduce respectively to: 

R = r'V^ (9) 

( 10 ) 


n = 


•r ~ 


^yn 


( 11 ) 



404 


The Journal of Educational Psychology 


And if in addition 72 be assumed as perfect (72 = 1.00) then (10) and 
(11) become respectively^ 



( 12 ) 


r 


/ 


1 

y/n 


(13) 


By substituting appropriately in the above equations, we may obtain 
the r' values necessary to produce the various values of 72 from varying 
numbers of tests. Certain of these aro shown systematically in 
Table IV. 


Tablb IV 


R = 1.00 R = .75 R=> .60 



r' 

r' 

1 *■' 

n » 

2 

.709 

.532 

.364 

n ® 

3 

.678 

.433 

.289 

n - 

4 

.60 

.376 ! 

.25 

n » 

5 

.440 

.335 

.223 

n » 

10 

.316 

.237 

.158 

n ■ 

100 

.10 

.076 

.06 


The entries in the column “72 =» 1.00” are of some theoretical interest, 
These figures show that from the synthetic point of view, the half of a 
perfect correlation is not .50 but .709; the third of a perfect correlation 
is not .333 but .678; the fourth of a perfect correlation is not .26 but 
.50, and so on. In this connection, formula (13) may be compared 
with Kelley’s “coefficient of tJienation” or lack of correlation (fc). 
This value is given by the equation 

k — a/i — r* 

Passing now from the mathematical to the speculative, let us 
consider the notion of a universal team of tests. Ideally this would be 
a battery of tests which would cover the entire range of the determiners 
of human behavior, yet with no overlapping of one teat by another. 
How many really independent tests would it take tlius to compass the 
range of human aptitudes? Judging by the great amount of overlap¬ 
ping of most mental alertness tests, it would not take very many really 
independent tests, if such could be found, to span this particular zone 
of human behavior. We know less about motor—and particularly 
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chfl-racter—traits, but presumably it will require a larger number of 
tests to cover these respective phases of human potentiality, to men¬ 
tion no others. And -while it would bo obviously unsafe to hazard a 
guess as to how many teats would be required as a minimum for such a 
team to approach universality sufficiently close to be useful, it is 
possible that the number may be smaller than might at first be 
supposed. 

The advantage of such a battery of tests for purposes of vocational 
vocational guidance, for example, would be enormous. Perhaps this 
may best be shown by pointing out the hopelessness of the system of 
vocational guidance as it is now developing. To be able to predict 
the probable vocational aptitude of a youth in one or two vocations, 
may serve very well the purposes of a prospective employer but by no 
means that of the youth seeking a life vocation. What the youth 
desires to know is what vocation of all the vocations of the world he is 
best endowed to pursue. This can be told only after knowing, in some 
sense, his aptitude in each. Now it may be that a really adequate 
vocational guidance is an impossibility, that psychologists will not be 
able to secure sufficient tests of the necessaiy diagnostic potency. 
But even assuming an unlimited supply of separate teams of tests of 
the typo now generally aimed at—the tests of each properly weighted 
by means of a regression equation for its particular vocation—the 
problem of vocational guidance would still be far from solved because 
of the expense involved in such a system. If each battery should 
require one or two hours for giving, scoring, etc., and the multiplicity 
of vocations should be reduced to 40 or 60 typo occupations, it would 
require from 50 to 100 hours of labor to discover the two or three most 
promising occupations for each individual! But if, instead of a multi¬ 
plicity of teams of tests, a single approximately universal battery of 
tests were available which, through the perfection of group testing by 
means of self-recording duplicatable apparatus and other devices, 
groups of 26 or 60 subjects could be tested at once, vocational guidance 
might easily be economical enough to become universal, By this 
latter method thei'e would be merely a multiplication of regression 
equations, each equation weighting the same team of tests in a differ¬ 
ent way according to the peculiar mental requirements for success 
in each particular vocation. Thus but one set of tests would need to 
be administered and scored. And lastly there might bo an automatic 
computing apparatus built on an extension of the principle of 
the modern statistical machines, which would solve the regression 
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equations. With the test data from a given subject punched down on 
an appropriate form, this might be fed into the machine and presently 
emerge with the whole series of predictiona recorded in the units of 
some convenient, uniform scale that would permit of instant compari¬ 
son. This, of course, is all highly Utopian. But without some such 
revolution in method as suggested above, it is difficult to see how scien¬ 
tific vocational guidance can ever become a practical reality for the 
masses of the people. 

Passing to the immediately practical, there seems little doubt that 
the near future will see a tendency towards semi-universal teams of 
tests—single batteries of tests which will apply to a limited variety 
of occupations of a given type, there being separate regression 
equations based on the same tests, for the various occupations of the 
parfciular group. And along with this ought also to come a specific 
concentration upon the psychology of the independencies among tests, 
with consequent changes in the point of view in devising now tests and 
in the technique of choosing the individual tests to make up teams or 
batteries for purposes of vocational prognosis. 



THE ADVANTAGES OF THE PROBABLE ERROR OF 
MEASUREMENT AS A CRITERION OF THE 
RELIABILITY OF 'A TEST OR SCALE 

P. H. NYGAARD 
University of Minnesota 

To measure the reliability of au educational test several methods 
have been employed. The first was to use the coefficient of correla¬ 
tion obtained by giving two forms of the test to the same group. A 
later improvement was to use the probable error of estimate of score 
between the two forma; that is, .6745 Recently it has 

been proposed, especially by Monroe, to employ the probable error of 
measurement, .6745 c's/l—r. This article will show, first, that the 
probable error of measurement possesses the advantage of stability 
over the two other criteria, and, secondly, that its value can be com¬ 
puted by a comparatively easy method. 

I. Stability op the PE op Mbasubbment 

It is a well known fact that the coefficient of reliability can be 
increased by increasing the range. If only one school grade is used, 
the coefficient obtained will be rather low, but all that is necessary to 
get a liigher one is to give the two forms to a composite group con¬ 
sisting of several grades. Neither, as will be seen, is the probable 
error of estimate free from variation. However, the probable error of 
measurement remains the same for a composite group as for n single 
group. 

To show that such is the case the following assumptions will be 
made: 

1. That two tests, or forms of the same test, A and B, are given to 
N groups of n cases each. 

2. That the standard deviation of both A and B in each group is the 
same, namely <ri. 

3. That the difference between the means from one group to the 
next for both A and B is constant and the same. Let c equal the ratio 
of this difference to a; that is, c is the difference divided by in. 

4. That the correlation between A and B is the same for each 
separate group; namely ri. Also whenever r is used, the absolute 
value of r will be understood. 
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These assumptions are not far-fetched, for they represent what 
practically always happens, or should happen, when forms A and B 
of the same teat are given to the school grades for which the test is 
intended. With these aasumptionfl from which to start, the aim wni 
be to investigate what occurs.when A and B are given to a composite 
group consisting of N of the smaller groups. Here, for the most part, 
only results will be given, as the proofs involve quite lengthy algebraic 
manipulations. 

First, as to the composite standard deviation. Let this be repre¬ 
sented by Off. Then 



14 


12 


( 1 ) 


Secondly, in regard to the composite coefficient of correlation. 

12 M j*) 

Let this be represented by rw. Then r;/ = 1 — 12 -f (jyg L i) ‘ 

If ri « 1, then jv * 1 — 0 = 1. If fi » 0, then there should be no 


difference in the means, so c « 0, which makes = 1 12 ~ ^ 

glance at the formula shows that Tn increases with an increase in c 
and also with an increase in N. By a slight algebraic change the 
formula may be written 

rjsr = fi + ^ 1) * shows that Vff exceeds ri, 


except when c = 0 , W = 1 , or rj = 1 . 
Thirdly, the effect on the composite probable error of estimate. 
Let this be represented by ctf, and the probable error of estimate of a 

^gUgroupbye,. Th^ne. = j r ,,) ’ W 

The fraction under the radical sign is always positive, and hence Cn 
ejiceeds 61 , except when c = 0 , iV = 1 , or ri = 1 . 

Formulas (1), (2), and (3) may in themselves be of considerable 
service. They may be used to transform composite group results 
into equivalents for a single group, or vice versa. 

Fourthly, as to the probable error of measurement. Let this be 
represented by 6 iv, and the probable error of measurement of a single 
group by ei. As the algebraic work in this case is brief, it will be 
given in full. Using the results of formulas (1) and (2): 




I2(i-n) 

12-hc^ (W2-1) 
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12 \l24-cnJV2 - !)• 


ey = .6746n 


4 


12+c2(jV2- 1)' 
12 +cHjV“-1 ) 12(l-r,) 


12 


== .6745 (TiV 1 — n. 

But ei is also equal to .6745(riVl—»'i. 


12 + c*(^*-l)’ 


Therefore, et, = €i. 

Hence the same value will be obtained for the probable error of 
measurement whether a single group or a composite group be used, 
which was found to be true of neither the coefficient of correlation as a 
measure of reliability nor the probable error of estimate. This is the 
basis for the claim made at the beginning, that the probable error of 
measurement is a stable quantity, whereas the other criteria are not. 


II. An Easy Method op CalcuIiAtinq the Probable Ebror op 

Mbasurbmbnt 

The usual formula for the probable error of measurement, which 
will be called e, is: e = .6745 Two forms of the same test 

have practically the same standard deviation. This will be assumed 
to be true. Arbitrary zero points will be used. The deviations from 
these arbitrary zero points will be represented by X and Y. By the 
ordinary formula, e = .6745«r's/l—r.J 


2xy- 


(2X)(sy) 


But r * 


n 


fUr^ 


XXY- 


(2X)(sr) 


Therefore < = .6745<r*/l — 


n 




Or e « .6746 


V n 


sxr, (sx)(sr) 




SX* (SX)® 

Substitute for its value,-?— 

' n fir 


Then e = .6745 ^ . 

V n n* n 
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Multiplying numerator and denominator of each fraction by 2, 

.6745^2^^ 2w W \ 2ft ^ 2ft^ 

_ (SX)=‘ ^ ff ^ ^ _ (S7)=^ 

2ft 


But 


2ft* 2 2ft 2ft2 

Substitute the latter for one of the former getting 

. = .en^MTirm 

\ 2n 


SK* 


2ft» 


2ft 


(SK)* 

2ft* 


22Xy 


2n 


+ 


2(2^) (sn 

2ft* 




2ft 

Or 6 = .6745 
Let X - r = 7. 

Then e = .6746 


(sx)*-2(sx)(sy) + (syv 

2ft* 




z( x-y* 

2ft 


(SZ-SK)* 

2ft* 


M 


27* 


(2Z- 27)* - 

ft 


By letting the arbitrary zero points be the zero of each scale, the 
Z’s and K'e are simply the original scores, and no deviations need to be 
computed. Nevertheless, the numbers involved will not be large, for 
only differences are squared. 

To illustrate the simplicity of the method, a short problem will be 


worked. 

Suppose the numb(a‘s in the cdlumna below, headed X and 

Y, represent scores in 

forma A and B, respectively, of a test, whose 

variability is the same. 



X 

y 

7* 


75 

71 

16 


77 

74 

9 


82 

82 

0 


73 

64 

81 


81 

81 

0 


79 

80 

49 


63 

76 

169 


70 

. 78 

64 


85 

83 

4 

= .6745.^^(392 - 10) ■ 

80 

80 

0 

= .6746^19.1 • 

766 

775 

392 

= 2.9+ 
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To conclude, it may be stated that the probable error of 
measurement is by far the easiest to calculate of the criteria of relia¬ 
bility mentioned at the outset, and is the only one possessing the 
requisite stability. Therefore, the writer is of the opinion that it 
should meet with universal acceptance among the workers in 
educational statistics. 


Supplementary Proops 


The assumptions made here are the same as those previously made 
in the article on page 407. 

1. Composite Standard Deviation. —^Before generalizing, let us take 
two particular cases. Suppose N = ^, and d *= c<ri, that is d is the dif¬ 
ference between the means. Then, as may be easily seen by making a 
drawing of the three distributions, 


<r3 


-4 


S(^+d)H- S(a;-d)»+25x*_ 


3w 


4 


2(^2da;+d*)4'2(ic®-2d*+rf2)+ _ 3 Sx^ + 2nd^ 


Zn 




3n 


Ore, =J^ + 2/3d^ ■ But —= q-i 2- 
\ n n 

Therefore, trs = 2/3d“ 

Suppose N = 4:. Then we shall get tn = 



Or 0-4 


/4Sx2-l-2n^+2n^’ 



5nd^ 

4n 


•%/Cl® + 6/4 d^ • 


No matter what number is used for N, the quantity under 
the radical sign will be plus some number times If N is odd, 
that number is evidently as follows: 


2C1-1-4+9+ . . . to^^^terms) 

' ■ ' nr 


N 


11 

N^l 

4 H" 9 + ■ ■ 'to 2 


Lt(l + 

}, when N is odd 


A 




-i; , N{N^-i) 

y-terms) = —^4— 
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If N is oven, the coefficient of d® is as follows: 
2(1/4+9/4+25/4+ . ■ . to jy/2terma)_ 


N 


(1+9+25+ ■ . ■ (toiV/2 terms) 


2iV 

But (1 + 9 + 26 + ... to N/2 terms) = 


N(N^-1 
6 


Therefore, when N is even, the number by which d® is multiplied is 


m 


12 


-1 

In either case, therefore, the coefficient of d® is —— Hence 


for any value of N, o-at = 


- 1 ) 

12 


Replace .d by its 


) , . ITTW^) 

value Cffj. Then, w =* H-H- 

This is formula (1) of the article. 

2. Composite Correlation. —Again, to make the reasoning clearer, 
N' will be given definite values. Suppose N ^ Z. Then the ordinary 
formula for the coefficient of correlation gives 

S(« + d)(y + d) + ^{x-d)(y - d) + Ixy 

-3W- 

I — i) 

Let U8 for the time being represent -ul H ——- by k. Then by 

formula (1), <rif = k<T\, 

Therefore, as® = 


„ 3Sw7 + 2nd3 Xxy , 

' • 3 ~ nfcV. ~ ^ 


da 


nk^oi' 
U 


Therefore, Tn = 


da 


^ Xxy _ 

Suppose ?'?■ = 4. Then ru = 

4rj<r4* 


But = kW. Therefore r„ = 

0-. = J^, + 6/4^, 

The numerical coefficients of the last term in each case may be 

_ 1 

obtained as before and have the same value, ——’ Hence, in 

1a 


= S + 5/S-^ 
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goneral, ’V == ;^ + 


ri , dKN^ - 1) 1 


13ftVi2 


^ 12(ri' J 


^2fM2 __ 1) 

But fc = \|1 H- —- and d = an. 


Ti -}- 


Therefore, = 


c2(Ar2 -1) 
12 


_ 12ri + cKi^2_i) 

oKi^^ ~1) 12 + - 1 ) 

12 


An equivalent foim obtained by a slight dgebraic cliaiige is rj^r = 1 

12(1 — fi) 

- 1 2 + cg(jv^ - T) ‘ article. 

3. Compoaiie PE of Estim ate .—^This is obtained by using the rela¬ 
tion, Qn = .6746^iv\/l“^V^- In this substitute for an and Tn the 
values given by (1) and (2). 

Then .6745.,^! + 


Sjy « ,6746fl’i 


12+cMAr»~l) 

12 


/144 -b 24c2(A^5 -1) - 144?’i* 24cK}^^ - l)rj. 

"V [12 + c*(;V®-l)P 

|I2^“(F*-1) 12fl2-’12ri®-i-2cW-l)-2cW-T)r,r 
a« = .6743ffii^ 12 ' [12 + c2(iV‘- 1)]> 

112 - 12r,“ + 2c’'(W» - 1) - 2c»(W’' - l)i'i 
e» = .6746.,^-i2TcW^^ri)-' 


Cy s= .6745o’: 
By - .6745ff 


i^Ja - n^) 


12-fc»{lV*-l) 
cKN^ - ixn^^ 
12 + cW - 1) 


Sj, = .6745ff 


:-J(l-n^)[l- 

iVl ~ T]^ ■ 


- l)(l - V;) 
[12 +c^(iV*-l)]a + ?'i) 


]■ 


n2+c«(Ar*-l)](H-ri) 


But .S745aiVl -- = fii* 


Ti. / ■ L , cHN^^m-rQ 

Therefore, e^r - Oi^l + ^^2 ^ __ i)](i ^ 

This is formula (3) of the article. 



THE VAXIDATION OS' INTELLIGENCE TESTS 
A. M. JORDAN 

UniveEsity of Nortli Garolmo. 

(Coniimted from September issue) 

Thua far little reference has been made to the correlations obtained 
by other investigators. It has seemed to the writer worth while to 
project his findings on the background of those of others. He has 
consequently gathered together a tentative bibliography composed of 
references which contribute in some way to the question of teat 
validation. These articles have been analyzed and the results tabu¬ 
lated so that because of the large amount of work which has been 
done on some tests we can determine pretty surely just where we are 
in the meaning of correlations which will be computed. The tests 
will be taken up in turn and the inferences drawn concerning them. 

Tasle XIV. —Distribution of CoiuiBLATiosns between Alpha and High 

School Grades 

Fcequhnoy 


Army Alpha .19 

2 

.21 


.23 

2 

.26 


.27 

2 

.29 


.31 

1 

.33 


.36 

3 

.37 

5 

.39 


.41 

3 

.43 


.45 

2 

.47 

2 

.49 

2 

.61 

2 

Number 

20 

Median 

38 

Range 

19-52 


These correlations have been computed under a variety of condi¬ 
tions which appear in the diversity of r^ults. It is evident that a 
correlation of .60 with high school gi'ades is a high correlation; i.e.f 
it is high compared with other computed r’s with the same criterion, 
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The coefficient of .48 obtained by me is well up toward the highest 
obtained by others. Moreover, the average of six coefficient between 
Alpha and English is .39; of eight between Alpha and history .33; of 
five between Alpha and general science .44; and of five between Alpha 
and Mathematics .36. 


r 

Alpha 1 .06 

2 .14 

3 .27 

4 .27 

6 .26 

6 .16 

7 .18 

8 .37 


When sub-tests are considered only in the case of Burtt and Arps 
(12) is there comparative data. In their study we find the following 
which differ very decidedly from the writer’s findings. Especially 
in the first four tests is this difference most apparent, for while their 
average of these four is .185 that of this study is .43. Which is the 
more typical only other investdgatora will discover. The correlations 
of these sub-tests with mathematics, English, general science, and 
history will have to wait for other investigators to be checked up and 
compared. 

Between Alpha and mental age five coefficients have been dis¬ 
covered with an average of .72. The aub-teat averages are given in the 
following table: 

Table XV.— ConnnLATioNs ov thb SoByrasTs oe Alpha with Mentaj. Aqb 


Avbraob or r's N 

Alpha 1 .46 2 

2 .67 2 

3 .S3 3 

4 .68 3 

6 .81 2 

6 .64 3 

7 .66 3 

8 .61 2 

Group test .73 6 


The correlation of .68 obtained in this study with mental age is 
seen to be somewhat lower than the average. There are some 
differences also among the sub-tests. 

The correlation of .61 with teachers’ estimates resembles the .60 
to .70 of the officers’ estimates and faculty ratings of .69. As far as 
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the writer can discover no publication has been made of correlations 
of teachers' estimates with sub-tests. 

The average correlations of Alpha with other tests are 


With Terman... 76 in 3 oases 

With Otis.76 in 6 cases 

With Miller.76 in 4 cases 

With Otis self-administering.76 in 1 case 

With University of Minnesota Tests.73 in 1 case 

With Haggerty, Delta 2.78 in 1 case 

With Myers Mental Measure...35 in 1 case 


The negative correlation of -’.29 with age is matched with a -.29 
obtained by Madsen and Sylvester (36). 

Table XVI.— Distribution op CowibiiAtion CoBFPicnDNTS between Alpha 
AND University and Normal School Marks 

FnEQUENCY 


22 

1 

24 

1 

26 

2 

28 

2 

30 

1 

32 


34 

5 

36 

1 

38 

4 

40 

1 

42 

2 

. 44 

1 

40 

4 

48 

3 

60 

1 

52 

3 

64 

1 

56 

1 

68 


60 


02 


04 


60 

1 

Number 

36 

Median 

• .416 

Range 

22-67 


The correlation of, 48 obtained by the writer in a previous investigation is 
slightly above the median oC. 415 and finally for this test there have been 
several correlations computed between sub-testa and University marks 
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Table XVIL—Correlations op Sob-tests op Alpha with Marks Obtained in 
Univerbity and Noioial School 

AVEnAQB 

.17 
.27 


Stm-TESTH 

1 

2 

3 

4 

5 

6 

7 

8 


.17 

.32 

.25 

.21 

.22 

.29 


Nruastt 

11 

12 

11 

12 

12 

11 

12 

12 


-.01 to .34 
-.10 to .39 
-.09to .43 
-.16 to .66 
-.15 to .46 
-.01 to .40 
,00 to .41 
.16 to .60 


Thus there is no consistency of correlations obtained. It would 
appear as if some investigators have (1) made errors in computation, 
or (2) have had groups differing widely in homogeneity, or (3) from 
some other reason or combinationB of reasons have procured results 
diverging widely from the average. Consider a negative correlation 
of .15 with Test 4 and grades. Test 4 is composed of opposites. The 
average of twelve correlations with grades in the case of this test is .82 
and yet one investigator obtains a coeflS.cient of —.16. The preponder¬ 
ance of evidence is against the correctaess of such findings. 

This teat (Army Alpha) has good reliability, being above .90. 

Otis Group Tost .—The Otis group test has been used very widely in 
the liigh schools and furnishes abundant data for purposes of com¬ 
parison. The following table sets forth the correlations from grade IV 
through the high school. 


Table XVIII.—Average Correlations op Otib Test with School Marks in 
Various Grades il/AROBLY from Colvin 14) 


Grade 

Avei’age 

Number 

Grade 

Average 

Number 

IVA 

.78 


VIIIA 

.73 

3 

IV 

.02 


VIII 

.68 

7 

VA 

.66 


IXA 

.60 

2 

V 

.69 


IX 

.62 

6 

VIA 

.61 


X 

.36 

1 

VI 

,73 


XI and XII 

.42 

1 

VIIA 

.05 


Whole High School 

.44 

0 

VII 

.71 

6 





Average Coefficients Grades lY-VIII .60 in 40 cases; Range .33-.91. 
Average all high schools .49 in sixteen cases; Range ,31-.82. 
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This average correlation of .49 ia not much different from .45 obtained 
by the writer in another investigation (31). This may well be com¬ 
pared with an average of .60 with grades when form A. and form B of 
this group test are both used. The correlations with grades in normal 
schools in three coses averaged .28 but the groups were quite homo¬ 
geneous. With the Staeford-Binet the Otis test has a fairly high 
correlation. The average for grades V to VIII w'as in six cases ,66 with 
a range of .46-.76; for the high school .63 with a range of .44 to ,73; 
while, when grades V to XII were included the correlation was .80 in 
one case. This average correlation of .63 in the high school cor¬ 
responds very closely with the .66 of this investigation. There are 
few or no publications of correlations of sub-tests with Stanford- 
Binet or with any other individual test. Otis correlates negatively 
with age. The three coefficients found have an average of — ,43 
with a range of — .41 to — .46 giving substantiating evidence to the 
excellence of the teat in this particular. 

Among those who have computed correlations with other tests 
probably Franzen (21) has done as much as anyone else. Many of 
the correlations below are token from his findings. 


Table XIX.— Corrhi/Ations op Otis Test with Othbr Tests. Numbers in 

PARBNTHBfiBS IndIOATB THE NUMBBR OP fo EnTDIUNO INTO TUB AVBRAQB 
(Data largely from Fianzen) 


Hollby Va Dslta AtwtA Miller Tshwan National A Mbntimbtbr 


Reading 

.66(6) 

.82 .78(2) .76(3) 

.78(1) 

.76(3) 

.68(3) 

Constant. 



.81 

.72 


Tburstonb 

HAflOEHTT ILLIHOW PBESBET 

Rbadino 

National B 

Reading 

.60 

.78 .90 

.37 

.88 

.73 

Constant. 

-- 

.30 .67 

.36 

.,. 

.66 


Dsarsobn-I Dearborn-S Prdmbt Xoqt 

Wywb 

MYcna 

Univ, op Min« 

Rending 

.00 

.66 .74 

CD 

.60 

.66 

Constant. 

.63 

.44 .60 

.30 

.66 



Thus it is seen that correlations of this test with other tests colled 
by the same name and purporting to measure the same thing, range 
all the way from .37 to ,90, If only we could discover which one really 
measured intelligence the rest would be easy. Correlations of Otis 
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tests with three separate composites reveal intorosting results. Otis 
correlated with a composite of Otis, Alpha, Miller, and Terman gives 
a correlation, coefficient of .93; this test correlated with a composite 
of National A and B, Haggerty Delta 2, Otis, Myers, and Ifelly- 
Trabue Language gives .68; and Otis correlated with the average of 
Terman, Otis, Thurstone, and Rogers gives an r of .97. 

The cori’elations obtained with teachers’ estimates of intelligence 
and the learning test have no other coefficients with which they might 
be compared. 

Terman Group Test .—The Terman group test of intelligence has 
not been on tbe market as long as the preceding two but still has been 
fairly widely used. With school marks the correlations are quite 
similar to those of the preceding tests. 

With marks the average of correlations is .47 in nine cases, with 
a range of .30 to .67. Hines^® obtained results quite discrepant from 
these, his coefficients averaging only .24 in nine cases. The writer is 
inclined to think that Hines’s results are not representative. The 
cori'elation of this test with marks in Latin is .66 in one study; with 
algebra .60 in one ease; with history .40 in five cases; with mathematics 
.26 in thirteen cases with a range of —.18 to .44; with science .14 in 
one case; and with general science .54 in four cases. The correlations 
of the sub-tests with grades have no other coefficients with which they 
could be compared. However, Professor Terman sent me the follow¬ 
ing correlations with corrected grade location which, according to him, 
is the best measure of "educated” that we have. These correlations 
were made with sub-tests which were slightly longer than those which 
now^compose the test. 


Terman 

1 

2 

3 

4 
6 
6 
7 

5 
0 

10 


ConnccTED GnAOB 

liOOATION 

.697 

.664 

.663 

,698 

.48 

.044 

.84 

.58 

.64 

.373 


Therefore this teat shows a high correlation with the amount of school 
knowledge which individuals have acquired. This corroborates to a 



420 


The Joui'rud of Edv^iov^ Psychology 


certain extent ttio writer’s findings*^ that “For all subjects com¬ 
bined Ternian stands above the rest because of Test 1 with a coefficient 
of .565 and because the correlation between the group tests and all 
subjects is .492.” 

The correlation of the testa os a whole with the Stanford-Binet 
ranges from .35 to .75 with an average at .64 which is somewhat below 
the present finding of .68. 

Here again the reliability is high. 

In collecting correlations with other t^ts the writer found the work 
of Franzen (21) a storehouse of information. 


Table XX.— CoaBELA.TioNa of Teeman with Other Tests. The Numbers in 
Parentheses REtRESENT the Number of r’a Entering into the Average 
(DftU. largely from Franzen) 

CosirosiTB Natidnal a Otib HAeaBUTT Illimoib PnEasBY SgRVBV 


Reading 

.92 


.86 

.80(6) .85 

.83 

.82 

Constant, 



.38 

.67 .68 

.62 

.70 


MestlMETEtl 

Naviokal B Dbarborv'I 

nHAWIQIlN-2 PRBaeBY XoBT 

Reading 

.89 



.76 .68 

.70 

.68 

Constant. 

.02 
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1- The correlations with individual teste range from .56-.89 with an 
average at .72 which should be compared with the coefficients obtained 
in this study of .70 with Miller, of .71 with Alpha; and of .78 with Otis. 

"With age the correlation is negative but not quite so largely 
negative, as was Otis, the average of five coefficients being here - .33. 

Miller Gioup Test ,—^This group test has been off the press such a 
short time that comparisons can hai'dly be made. I am indebted to its 
maker for several oorrelateons which I shall give. The correlations 
with school marks are .56 and with a grammar test .39. The average 
correlations found by the writer in aprevious study^i is .48. 

With teachers’ estimates, Stanford-Binet, and the learning test 
there are no coefilciente for comparative purposes. Particularly inter¬ 
esting would be the correlations with Stanford-Binet with this test 
since the Miller test ranked lowest wi^ this criterion. 

There are some interesting correlations with other tests. 
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Table XIX,—Cohrelations oe Miller Group Test witii Other Intellioenoe 

Tesm 

(Data funuahcd Iat((cly by Miller) 

Numbers in Pnrenthesos show Number r’s euteriiig Average 
t/xiv. OP Minn, Mbnimeteu Myebb Otis Tebman Delta 2 ALt'HX Thotindike 

.72(2) .60(3) .32(2) .76(4) .73(4) .78(1) .76(4) .82 

Thus the correlations range from .32 to .82 with the average at .60. 

When composites are made Miller stands well. The coofficient 
of correlation of Miller with a composite of Miller, Terman, Alpha, and 
Otia is .90; with a composite ot Miller, Dclta-2, Terman A, Alpha, and 
Mentimeter is .90; while a composite of Univ. of Minn., Mentimeter, 
Thuratone IV, Myers Mental Measure, Otis, Terman, Miller A and B 
has a correlation of .88 with Miller B and .85 for Miller A. 

The reliability coefficient ranges from .86 to .91 with an average 
at .90. 


SUMMABY 

The summary will be giveninordcraccordingto the plan already laid 
down on pages 350 and 351. 

1. (1) ThecorrelationsofthefourgrouptestslAlpha,Miller,Terman, 
and Otia) with menial age (Stanford-Binet) are fairly high. Three 
of them hover around .68 while the fourth (Miller) has a correlation 
considerably lower (.63), Of the sub-tests, Alpha-4 (Opposites) 
has the highest coefficient (.61) but several others stand well in this 
respect. Some of these latter are: Oti8-2 (Opposites). 59, Otia-i 
(Proverbs) .57; and Terraati-3 (Opposites) .57. It is noteworthy 
that any of the sub-stests mentioned have a higher correlation with 
mental age than the Miller group test. 

2. With age the correlations in all group tests and with all sub¬ 
tests save one are negative. The one exception is Terinan-9 (Classifi¬ 
cation). This substantiates the usual dictum tliat in not too large a 
range the younger pupils are on an average the brighter. The highest 
negative correlation is —.455 with Otis-9 (Story Completion), this 
being even higher than the correlations with ago of any of the group 
tests taken as a whole. 

3. With average grades the correlations of the four group tests are 
grouped around .47. The lowest is .45 (Otis) the highest is .49 (Ter¬ 
man). Among sub-tests Terman-1 (Information) has t-h^ highest 
correlation (.65) while Otis-10 (memory) has the lowest (.14). When 
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individual sul)jecta in the high Bcho<d curriculum are considered W6 
find that Miller correlates highest with English (.S6) while Otis has the 
lowest* coefficient (.47). Moreover among the sub-tests Miller-1 
(Mixed Sentences) correlates highest of all (.69). The highest and 
lowest coefficients in other subjects arc: Mathematics', highest Otis-S 
(Arithmetic Problems ,68), lowest Otis-S (Disarranged Sentences 
,035); General Science; highest Terman Group .64, lowest Otis-8 (Simi¬ 
larities .10); and History: highest Terman-6 (Sentence Meaning .69), 
lowest Otis-1 (Hard Written directions—.21). 

4. The coefficients of correlation between intelligence tests with 
teachers’ estimates are unusually high, no group test being below .60 
and one (Otis) having the high correlation of .73. The sub-tests also 
show in general significant correlations with this criterion ranging from 
.24 in Alpha-1 (Hard Oral Directions) to .63 with Terman-3 (Oppo¬ 
sites) and .62 with Alpha-2 (Arithmetic Problems). It maybe observed 
that as far as teachers’ estimates of intelligence of the last two 
mentioned is as good as Army Alpha taken as a whole. 

6. The correlations with laming lest are uniformly low. The 
coefficients in general are grouped around .20, the highest being .31 
with Alphar2 (Arithmetic) and the lowest -. 126 with Torman-8 (Mixed 
Sentences). Among the tests taken in groups Otis with .23 is highest 
and Miller with .17 is lowest, although “high” and "low” here have 
little significance. 

6. When a comfosiie of the four tests is made and this used as a 
criterion, and correlations made with it, the coefficients in general are 
unusually high. The group teats correlate around .00 with little to 
choose between them while among the sub-tests Termau-3 (Opposites) 
has the unusually high correlation of .83, and Alpha-1 (Hard Oral 
Directions) has the lowest of all, .395. 

II. In all cases but one (grades) the factor of age was made constant 
by means of the coefficient of partial correlation. The result of this 
statistical treatment was to decrease the coefficient in practically all 
cases from .01 to .06. 

III. (1) The average displacemmis from corresponding thirds when 
two tests were compared ranged from .28 to .47 per cent depending 
upon the degree of similarity between the tests, Otis and Miller having 
the smallest displacement and Terman and Alpha the largest. 

2. When correlations in the regression equation were assumed to 
be perfect and transmutations were made from the three tests to the 
fourth and the average differences computed the largest average differ- 
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ence was in the case of Terraan, 15.4 units; the smallest with Miller, 
8.6. However, when each of these averages ia divided by the SB's of 
the respective tests Otia has the smallest, 2.45; Miller next 2.46; 
Terman next 2.59; and Army last 2.63. 

IV. The average of the collected corrdations of the teats with various 
criteria is quite similar to the findings already reported. These are 
Alpha .38 (high school), .42 (University), Otis .44, Terman .47, and 
Miller .66. In the case of mental age Alpha stands first with .72, 
Terman next .64, and Otis .63. I discovered no correlation of Miller 
with mental age. With teachers* eelimates only Alpha, ranging from 
.50 to .70, could be found. With age the two discovered are quite 
similar to the results obtained in this study, Otis — .43 and Alpha — .20. 
There are correlations made also with various composites but no clear 
cut infei'ences can be drawn, and finally correlations have been made 
with other tests with varying results. 

V. A variety of studies has been made, using the tests discussed 
here. Sometimes merely tabulated results of the scores have 
been given; sometimes rather elaborate statistical procedures have 
been used with some interpretation of results. Onlythelatfcer have been 
included in the present bibliography and among these, all have been 
omitted which had neglected the idea of comparison of the tests with 
some criterion. In general there has appeared a feeling of disappoint¬ 
ment. In one extreme case 80 strong a word as “contra-indicated’* 
is used by Bridges in reference to the helpfulness of the testa in fore¬ 
telling success in college subjects by Alpha tests, On the other hand 
some (among them Colvin) have shown in a convincing way the value 
of the tests in a variety of relations. 

VI. In considering sub-tests no other test stands so high with all 
criteria as Opposites. Arithmetic Reasoning stands next to Opposites 
as a useful sub-test for a group test of intelligence, then comes in order 
Geometric Figures and Pj'overbs. 

VH. As far aa our data go, Otis is the best all round test for testing 
intelligence at the high school age; Terman ranks next; Alpha third, and 
Miller fourth. Terman and Alpha are so close together that they 
almost tie for the second place. This inference assumes that prac¬ 
tically all the criteria are of equal value. But weighing them almost 
as you choose Otis would be ahead. 

VIII. Discussion and Conclusion ,—If we ask now what do tests 
measure this investigation can not answer definitely. Certain findings 
however, do throw light on this question. Each of the four tests used 
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is related to the sevei'al criteria used in varying degrees of closeness. 
We now know that eithei’ “capacity to learn” is not a good definition 
because it has been demonatrated that at least in one simple 
case of ideational learning thei-e was only slight correlation with the 
test. My opinion ia that intelligence comes more nearly being the 
ai-pacity io team whm the material leanmi is difflculi for the learner. 
The “capacity to learn” includes too much- 

The testa used do differ from each other but only in one case 
(mental age) widelyenough to be significant. Apparentlylarge deter¬ 
mining differences turn out to be .03 to .07 which might be changed 
in another investigation. This in a multitude of discouraging findings 
is a hopeful sign. Did one know the weight to be attached to the 
various criteria and if they could be perfectly combined perhaps one 
composite criterion might be used with which teste would correlate 
much more highly than they do with any of them taken singly. 

Now for the various criteria used. Mental age is limited in two 
ways; ^(L) It limits the possible IQ of some at thirteen and fourteen, 
and of many at fifteen to sixteen and older. After thirteen or fourteen 
years if an unusually bright child should be measured from year to 
year his mental age would remain constant while his chronological 
age was increasing, thus causing a lowering of the IQ. This limits the 
usefulness of this instrument. It seems in some cases a poorer instru¬ 
ment than the group test for the IQ correlates only .57 with teachers' 
estimates, .70 with composite, and —.16 with age, in each case lower 
than any one of the group tests. (2) The tests themselves are limited 
in depending too much on reproduction of haphazard numbers and 
definitions of words. One glaring error appears in putting in tho 
question of distinguishing between “character” and “reputation” for 
at least among those persons that I have tested this difference has been 
learned by heart either in the copy book or in some other place. 

In teachers' estimates, at least in this study, there appears the moat 
important extra-test criterion. The estimates were so carefully made 
by teachers of mature opinions who had been trained to look, for evi¬ 
dences of intelligence that the averse results can be taken almost at 
face value. Those tests, therefore, which correlate highly with this 
criterion have in my opinion much in their favor. 

The learning test was a distinct disappointment. That it seems 
difficult enough may be determined by attempting to work out the 
letter occurring midway between the twelfth and eighteenth. More¬ 
over it appeared that twelve practice periods of three minutes each 
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would be time enough for considerable learning and that the average of 
the first three subtracted from the average of the last three would be 
at least “capacity to learn,” and yet the correlations with it were 
exceedingly low. It would be interesting to try correlations with the 
learning of more difficult material. 

Age seems an important enough factor but not to be depended 
upon too much since with a wider range the coi-relations become posi¬ 
tive and in that, at present, just the width of that range where the 
change takes place is not known. 

Composites always seem to me like loading the dice before the 
throw is made and arc much poorer validating material than any of 
the other criteria. 

Grades are possibly next in importance to mental age. They have 
the advantage of extending over long periods of time and of eliminating 
ups and downs due to chance variations but are so complicated with 
other factors that intelligence is an unknown factor. 

The most hopeless results obtained were those concerned with 
variations in scores between two teste that have high correlations 
with each other. The only hope of ever improving our results that 
seems feasible is to throw out all cases of unusual variations and test 
them individually. One might lay down this safe diotum, that if the 
individual’s scores on various tests are within 2 PE of the average 
variation of tests from each other (see page 363) then their scores 
should be thrown out and they be tested further. 

Recommendations 

It is recommended that the National Research Council be prevailed 
upon to undertake a great testing program in order to perfect a 
standard series of group tests of intelligence suitable for testing high 
school pupils and adults. 
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PREDICTING ACADEMIC SUCCESS 

MAHK A. MAY 
Syracuse University 

If all the factors that contribute to success in college were known so 
that intercorrelationa could be obtained between them and also 
between each factor and some measure of success, a formula could 
then be written for predicting success. The general form of this 
formula would be: 

Success = XiA + XzB + X 3 C. , X^N + K 
in which Xi, X^, Xs ■ . .X„ are the factors and A, B, C, . .N 
system of weights so arranged to yield the maximum correlation 
between the right and left hand sides of the equation. Thus to pre¬ 
dict the success of a given individual all we need do is to substitute his 
score in each of the factors for Xj, X 2 , X 3 , etc. and multiply each by its 
appropriate weight. The precision or exactness of the prediction will 
depend partly on the size of the correlation between the right and left 
hand sides of the equation the correlation between actual success 
and predicted success in a large unselected sample) and partly on the 
size of the standard deviation of the measure of success. For the 
stan dard de viation of the distribution of errors is given by tho formula: 

which c, is the standard deviation of the measures of 
success^ and R is tho correlation between the actual measures of success 
and the predicted values. It is obvious that if R is large, say .90 or 
more, it does not matter how large v* is for the whole thing approaches 
aero. But if R is as low as .80 it makes considerable difference how 
large iTb is. 

This is merely a brief statement of the general mathematical 
principles on which such predictions depend and are well known to 
those acquainted with statistical methods. Our main problem is that 
of defining and measuring academic success and of discovering and 
measuring the elements that compose it. 

What is academic success? In general, it is intellectual achieve¬ 
ment, leadership in student affairs, development of a proper view pf 
life, choosing a vocation, building character, etc., etc. For the present, 
at least, and until such a time as authorities may agree on a definition 
we must narrow it down to something definite and concrete. For the 
purposes of this investigation we shall define academic success as 
intellectual acheivement and assume that it is measured l^y college 
grades or marks. Or more precisely, we shall assume that the quality 
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of intellectual work is measured by marks and that quantity is 
measured by semester hours. The only defense for such assumptions 
is that these are our only available means of measurement. 

The purpose of this study is to ascertain how accurately the aca« 
demic success of 460 Liberal Arts freshmen could have been predicted. 
As a measure of their intellectual achievement we have their '^credit 
points'* or .“honor points" obtained at the end of the first semester. 
The grade of A, which means very superior work, carries 3 honor points 
per semester hour; the grade of B gives 2 honor points: 0 gives 1 
honor point per semester hour and is regarded as an average grade; 
D is passing but gives no honor points. This X system is, like all 
others, entirely arbitrary but practical. It enables us to use the 
student’s total number of honor points as a measure of his intellectual 
achievement. Since the normal load for a freshman is 16 semester 
hours, the maximum number of honor points obtainable is 48. The 
honor points of the group under consideration were distributed as 
follows: 

Honct points. 0 3 6 9 12 16 18 21 24 27 30 33 86 30 42 45 48 

Frequency. 30 32 40 41 47 47 40 46 36 26 24 10 11 10 7 4 3 

Proceeding on the assumption that honor points measure intellec¬ 
tual achievement, our problem is to predict the honor points any 
student will make in a given time. Some of the factors on which 
intellectual achievement depend are: (1) General intelligence as 
measured by some standardized intelligence test; (2) preparatory 
school work as measured by the quantity of work (units) and by the 
quality ol work (grades); (3) the industry or application of the student 
to his intellectual tasks measured roughly by the number of hours per 
week spent in study; (4) the mental efficiency of the student, or knowl¬ 
edge of how to use his mind, which factor we cannot measure; (5) 
interest in work, or strength of incentives to learn, or motives for 
being in college, another factor which defies measurement; (6) certain 
traits of character and personality factoraj and finally (7) health and 
physical and social environment. This, is by no means a complete 
list of the factors on which acadomio success depends. There are 
doubtless many other subtle elements that play a part. It remains for 
scientific investigations to isolate these and measure them.l 

The general intelligence of the experimental group under considera¬ 
tion was measured by a combination of the Miller Mental Ability 
Test and the Dartmouth Completion of Definitions Test. The Miller 
Test contains 120 elements and the Dartmouth test contains 40; 
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thus the maximum combined raw score is 160. The reasons for 
combining the raw scores is that this turned out to be about the opti¬ 
mum combination which gives the maximum correlation with honor 
points. The distribution of the combined scores is given below: 


Score 

Frequency 

Score 

Frequency 

140- 

1 

70-70 

25 

130-139 

13 

flO-09 

16 

120-129 

27 

60-60 

6 

110-119 1 

87 



100-109 

103 ' 



90-99 

118 



80-89 

46 




The correlation between general intelligence as measured by these 
tests and honor points obtained at the end of the hrst semester is 
+ .60. Since this correlation is somewhat higher than those usually 
obtained between intelligence and college marks it seems advisable 
to piesent here the correlation table. 



r = +.00 


One interesting thing about this table is that individuals who score 
above the mean in intelligence (about 100 point) are more likely to 
make less than the mean number of honor points (about 18), than those 
who score less than, the mean in intelligence are to make more than 
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mean in honor points. That is, the correlation is higher in the upper 
ranges than in the lower. The reason isprobablythatsomeindividuals 
may have a high intelligence rating and still do poor work in college 
because of lack of application, or poor methods of study, or something 
else of this nature, while on tho other hand a student who is low in 
intelligence can make high marks only by good methods and great 
industry. Thus the tendency exhibited above may very well be due, 
in part at least, to the “tendency to least effort” so common among 
many college students, even the brighter ones. Indeed, we have data 
to show that those who had intelligence scores above the mean and who 
received less than the mean number of honor points actually studied 
from 4 to 10 hours a week less on the average than those of the same 
level of intelligence who received more than the mean number of honor 
points. All of this suggests that industry or application to work is an 
important factor. Presently we shall show that the partial correla¬ 
tion between intelligence scores and honor points with hours per week 
of study constant is +.805. 

As a measure of high school preparation, tho number of units offered 
and the average grade obtained, were used. These are no doubt 
inadequate measures but are the best we have. Tho correlation 
between units offered for entrance and honor points obtained at the 
end of the first semester is +-.22. One reason for this low correlation 
is the skewed distribution of units. This distribution is given below. 

Unite 14 16 10 17 IS 19 20 21 22 23 

Frequency 40 173 117 68 30 19 0 4 1 1 

The quality of the high school preparation was measured in terms 
of tho average grade obtained in the work offered for entrance. If 
the student offered in addition his Regents Examination marks in 
the State of New York, or if he offered College Entrance Board 
Examination marks, these were used in every case as being more 
reliable than the high school averages. The correlation between 
these measures and honor points is +.405. 

Assuming that units is a measure of the amount, or quantity, 
of preparatory work; and that average grades is a measure of the 
quality of this work; and assuming further that quantity times quality 
is a better measure of work than either quantity or quality taken alone, 
we would expect to get a higher correlation from a combination of 
these two. When we multiply them, that is when we weight the units 
by the grades obtained in them, the correlation of this product with 
honor points is +.24. This is little better than units alone and very 
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much worse than grades alone. Alincar combination in which “‘best” 
weights are assigned give a combination which yields a maximum 
correlation with honor points of 4'.406 which is no better than average' 
grades taken alone. 

Application to work, or general industry, was measured by 
the average number of hours per week each student spent in study. The 
value of such a measure obviously depends on its reliability. The 
question here is how much dependen'ce can be put in a student’s state¬ 
ment of the average number of hours per weelc he spends in study. 
The information was received in two ways. Early in the semester 
each student was asked to fill in a card telling how ho spent his time 
each week. There was a blank space for “hours spent in sleeping,” 
“hours spent at meals,” etc. Among the many items was “hours 
spent in actual preparation of assigned work.” An effort was made 
to convey the impression that we were interested primarily in how 
they spent their total time rather than in the time spent on any one 
item. Thus the temptation to over-state the number of hours spent at 
study was partially overcome. Again after the middle of the semester 
each student was asked to fill in another cai’cl which called for certain 
information concerning each course. One item of information was 
the average number of hours per week spent studying each subject, 
These results were totaled and each total compared with the previous 
statement. The correlation between the two Btatements was surpris* 
ingly high, +.86. It could hardly be expected to bo perfect since 
some students were studying more and some less at the middle of the 
semester than at the beginning. The statement used in this study was 
the one given at the middle of the semester since it was felt that the 
average number of hours per week spent in study for the middle weeks 
of the semester was more representative of the students' industry than 
that of the first weeks. The con-elation between this measure and 
honor points is +.32. 

So far we have made no attempt to measure such factors as montal 
efficiency, character traits, personality, health, and environmental 
influences all of which play a prominent part in academic work. Just 
how much each of these factors contributes to success in college remains 
to be determined. Our immediate problem is to determine what*pre- 
dictions can be made from the factors which we haveroughly measured. 

Ultimately the reliability of prediction depends on the correlation 
between the instrument of prediction and the thing to be predicted. 
In case there is more than one agency of prediction the intercorrelo- 
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tioDS must also be known. Aooordingly the table of intercorrelationsia 
presented below. 



1 

Honor 

point 

Intolli- 

gence 

High 
school 
average 
grade ' 

Units 

Hours 
per week 
of study 

Honor points. 

1.00 

.60 1 

.40 

.22 

,32 

IntelligGUCG. 

.60 

1.00 

.30 

.20 1 

-.36 

High school average grades.... .| 

,40 

.36 

.100 

,40 

.11 

Units. 

.22 

.20 

.40 

1.00 j 

.26 

Hours per week of study.. 

.32 

-.36 1 

.11 

.25 

1.00 


Since predictions require also means and standard deviations we 
present a table of them. 



! Means 

! Standard 
deviation (o-) 


18.6 



100.6 

16,8 


79 

7,6 


16.1 

1 R 


24. 




With these data we are ready to see how well we can predict 
intellectual achievement from the above mentioned factors. The 
formula for predicting X from Y is: 

ffj, 

The standard deviation of the errors of prediction is given by 
Se = £ra? Vl -- r\y 

The formulas for predicting honor points from the factors given 
above when taken one at a time wre: 

Honor Points o- op BnROHS op prediction 

.42 Intelligence—26.. 8.9 

. 69 High, school average — 28. 10.1 

1.61 Units -8...... 10.0 

. 68 Hours per week +4.... 10.4 
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For example, if a student has a score of 140 in intelligence his most 
likely number of honor points as predicted from his score will be 
.42 X 140 — 25 = 34. The 8.9 tells us that tho chances are about 
2 to 1 that his honor points will be between about 25 and 43; or in other 
words, it simply tells us that of all those who score 140 in intelligence 
their mean number of honor points is 34 and standard deviation of 
their distributed honor points in 8.9. Such predictions are not Worth 
very much for individual cases but are better than nothing. 

When, these factors are combined into a single equation the relia¬ 
bility of the prediction is inci'eased. The general form of such an 
equation is stated at the beginning of this paper. The problem, of 
course, is to find the best system of weights to attach to the factors so 
as to get the most reliable instrument of prediction. The statistical 
labor involved in doing this is great especially when there are more than 
three factors. As tho number of factors increase the work multiplies 
enormously. Several statistical short cuts have been proposed. The 
method of correlation determinants was used here.^ 

When we put all of the factors which we have measured into a 
single equation and give to each its “best” weight bo that the com¬ 
posite will yield tho maximum correlation with honor points, we 
obtain the following: 

Honor points = .58, intelligence -b .14, average grades — 1,03, 

units +I.I0, time —62 (I) 
Thus if a student has an intelligence score of 120, and a high school 
average,of 75, and offers 15 units for entrance, and studios on the 
average 30 hours a week, his most probable number of honor points is: 
(.68 X 120) + (.14 X 75) - (1.03 X 15) + (1.10 X 30) -62 = 36 
The standard deviation of the errors of prediction is given by the 
formula 

ahp-\/1 — in which R is the correlation 

between the right and left hand sides of the above equation or, what 
amounts to the same thing, the correlation between the actual number 
of honor points obtained and the number predicted by this combination 
of factors. R is obtained directly from the formula 



in which Doo is the coiTelation determinant and Du is its first minor. 

* Kelly, T.: “Statistical Method,” Chap. XI. Also Memoirs of the National 
Academy of Science,” Vol. XV. Part 3, Chap. II. 
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la this case R = .+84. The standard deviation of the errors of 
prediction is 6.0 and the probable error of prediction is .6745 times 
the standard de^dation, or 4.04. Tims, in the above case where 
the most probable achievement in honor points is 36, the chances 
are even that they will not be less than 32 or more than 40. 

It is obvious that the reliability of any agency of prediction is 
meosured by the size of the errors; and the errors depend on the degree 
of correlation between the agency and the thing predicted. Just as 
the reliability of an intelligence examination depends on the correlation 
between the aggregate of tests and some "criterion" of intelligence 
and not on the number of tests or the length of them; just so do pre¬ 
dictions of this sort depend on the correlation and not on the number 
of factors entering into the composite. Accordingly we find that we 
can eliminate the factor of units from our equation and still get a 
correlation of .838, whereas leaving units in we only get .84. And 
furthermore the PE of the errors of prediction rises very slightly. 
But when we eliminato units we have new equations with diffei'ent 
constants; 

Honor points = .66, intelligence +.083, average grades +J.06, 

' time —70. (2) 

The low weight attached to High School averages indicates that this 
factor, too, might be eliminated without doing serious damage to our 
instrument of prediction. When we do this the equation becomes: 

Honor points = .62, intelligence +1.2, time -70. (3) 
= .825 and a-, = 6.3 and PE = 4.10 

Thus by knowing the Intelligence of a student and knowing the 
time he spends at study we can predict his honor points with but 
slightly greater error than we could if we knew also the units he offered 
and the average grade he mode in them. This all comes about because 
of the negative correlation between time spent at study and scores in 
intelligence tests. It is this same negative correlation that causes 
grades to have such a low weight in equation (2) and units to have a 
negative sign in equation (1). Indeed, this negative correlation is one 
of the most significant facts of the whole study. 

Unfortunately, the factor of application or industry as measured 
by time spent in study is of no use in predicting the success of a 
candidate for entrance to college simply because we have no measure of 
it until the student has been in college for at least a few weeks. Leav¬ 
ing it out of the equation we have: 
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Honor points = .37, intelligence +29, 
R = .64 and ff a = 8,6 


average grades +.27, units 
-47. (4) 


It is interesting to note here that intelligence alone is almost as 
good a means of prediction as the three factors in equation (4) for 
intelligence alone correlates .60 with honor points and the standard 
deviation of the errors of prediction is but slightly higher'—^8.9. 

So far everything points to the conclusion that intelligence and 
industry are the two most important factors in academic success in 
comparison with which high school preparation does not count for 
much. This hypothesis can be tested further by the partial correla¬ 
tion technique. The partial correlation between honor points and 
units with intelligence constant is +.127j with average grades constant, 
is +.071, and with time spent in study constant is +.043; with intelli¬ 
gence, high school averages, and time spent in study all three constant 
is - .218. All of this means that there is no causal relationship be¬ 
tween the number of units a student offers for entrance and the number 
of honor points he will obtain in the first semester, so far as our data 
are reliable. It does not mean, on the other hand that a student with 
zero units is just os likely to succeed as a student with an infinite 
number, simply because the scale of units with which wo are dealing 
runs from 14 to 23 and not from zero to infinity. To state the case 
precisely we should say that a student offering 14 or 16 units has the 
same chance of success as a student offering 20 or more units, other 
things being equal. The *'other things” are intelligence, industry, etc. 
However, passing now from facts to speculations the writer is of the 
opinion that even if the units distribution ranged from say 5 to 50 units 
the correlation with honor points would not rise above .35 or .40 and 
that the same low partial correlations would obtain when intelligence 
and industry are kept constant. 

Aside from statistical treatment there are other considerations that 
lead us to believe that the amount of preparatory school or high school 
work is a very poor index to academic succe^. High school graduates 
who come to college with three years of foreign language are in many 
instances unable to enter the fourth-year course in that language with 
a college-trained group but are forced to drop back and take the 
“college” third year which they tell us is usually equivalent to the high 
school fourth year. This seemstobetruenotonly in languages but also 
in sciences and other subjects. For example, students who have had a 
year of physics in high school ore seldom able to take advanced courses 
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in college without first taking tho elementary course. Indeed, there 
are now and then coses in which tho student seems to be worse off by 
having taken the high school couvae at all. 

To go back again to the statiaticol data the equation for predicting 
honor points from units alone is: 

Honor points = 1.61, units — 8 
it ^ .22 and (r„ = 10.6 PE = 7.16 

According to this a student may offer as few as 10 units and still have 
an even chance of making enough honor points to stay in college (9). 
By plotting the percentage of students who fail at the different levels 
of units and extending the line on down by extrapolation we find that 
around 8 units about 60 per cent would pass and 60 per cent fail. On 
the other hand, according to the above formula in order for a student 
to be reasonably certain of success, he would have to offer at least 30 
units. When we substitute 15 units intheabove equation we find that 
tho most probable number of honor points is 16 plus or minus 7. It so 
happens that 16 is just enough honor points to keep a member of this 
group off probation. 

This suggests the question of the basis of the traditional standard 
of 16 units for admission to college. Why 16 rather than 10, or 20, 
or some other number ? The answer seems to be that after many years 
of experience in several institutions 15 units seems to bo about right. 
Furthermore, four years of high school work at the rate of five subjects 
per year yields 20 units of work, but tho presumption is that on the 
average 6 of these units will be tho wrong kind. It would bo an 
interesting scientific study to determine the minimum number of units 
a student could offer and still have a fair chance of success. The data 
here seem to indicate that we could afford to drop the units requirement 
to say 10 units, and then rigidly apply other means of selection. 

The factor of high school averages turns out to be somewhat more 
significant than units when tested by the partial correlation method. 
The partial correlation between high school grades and honor points 
with intelligence constant is +.246; with time spent at study constant 
is +.388; with units constant is +.348; with both time and intelli¬ 
gence constant is .+318. This would indicate that there is some 
causal connection between high school averages and academic success. 
Hence we may assume, until we have evidence to the contrary, that the 
quality of high school work as measured by grades, is a factor of success 
in college, although a relatively minor factor. 
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That general intelligence is the most important single factor in 
academic success there is no doubt. When we apply the partial 
correlation method of analysis the results are surprising. The partial 
correlation between intelligence and honor points with time spent in 
study constant is +.805; with high school averages constant+.532; 
with units constant+.59. The high partial correlation of + .806 is due 
to the negative correlation of — .35 between intelligence and time spent 
in study. The brighter they arc the less they study. Thus some 
bright students will make low grades and some dull students will make 
high grades by sheer industry. If application to work were propor¬ 
tional to ability, then all correlations between college grades and intelli¬ 
gence scores would rise much higher. 

The general conclusion from all this is that the most reliable means 
of predicting academic success is a combination of intelligence and 
degree of application. Since we cannot know the factor of application 
in advance of the student’s admission to college, it cannot be used for 
the purposes of prediction except in a limited way. If a student has an 
intelligence rating of X and wants to know how many honor points he 
will receive if he studies Y hours per week, we can tell him by making 
the proper substitutions in equation (3). 

If on the other hand, the student wants to know how much time he 
must spend in study in order to receive a certain number of honor 
points we must use the other regression. 

Time *= .45, honor points—.326, intelligence +50. 

The PE of the errors of prediction is 2.7 hours per week. Suppose 
an individual has an intelligence rating of 90 and wants to make 16 
honor points he must study on the average of 28 hours a week, plus 
or min us 2.7. This seems to be the most practical and most reliable 
kind of prediction that we can make. For easy reference and every 
day use we can construct a table from the above equation which will 
aid in giving students advice as to how much studying they should do 
in order to receive a given number of honor points. Of course, this 
sort of thing has limits and will work only in a restricted range. 

Table for predicting the number of hours per week a freshman must 
study during the first semester in order to obtain a given number of 
honor points when his intelligence score on combination, of tests here 
used is known. 

The three upper rows of this table are given here merely to illustrate 
the method. 
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Honor Points 


Intelll- 

eonoe 

1 16 

18 

21 

24 

1 

‘ 27 

f 

! 30 

1 

1 

30 

1 

30 

1 

42 

1 

46 

48 

140 

U.l 

I3.fi 

13.S 1 

IS.O 

10.5 

17.8 

10.2 

20.6 , 

21.0 

23,2 

24.6 

26.0 

136 

12.7 

14.1 ' 

16.4 

1Q.8 < 

iS.l 

19.6 

20.8 < 

22.2 

23.6 

24.0 : 

20.0 . 

27.0 

130 

14,4 

16.7 

17.1 

18.4 

lO.B 

21.1 

32.4 

23.8 

26.2 

2fl,6 

27,8 

29,2 


If for example, a student Kas an. intelligence score of 140 and wants 
to make 16 honor points ho must study on the average of 11 hours pet 
week, plus or minus 3. 

This sort of prediction is valuable for students who have been 
admitted to college. It has little value as a means of scleetion because, 
as stated above, the factor of application is not kiio\vii> 

Looking to the future we may profitably inquire as to what should 
be known in order to predict academic success with a reasonaUe degree 
0 } ceriainiy. By "reasonable” we mean that tho PE of prediction 
should not be more than 3 honor points. If tho distribution of honor 
points were normal the standard deviation would be about one-sixth of 
the range, or 8 honor points. As a matter of fact nearly all distri¬ 
butions of academic success are more or less skewed which means that 
the standard deviation is more than one-sixth of the range. Thus with 
a standard deviation of 8 to 10 honor points the correlation with the 
agencies of prediction must be at least .90 in order that the standard 
deviation of the errors of prediction be reduced to 3 honor points. In 
the last analysis we are seeking a combination of traits and elements 
which will correlate as much as .90 with academic success. Such a 
correlation will probably not be obtained until we can measure some of 
the more or less intangible traits of character and personality. 



A COMPARISON OF IQ’s OBTAINED WITH DEARBORN 

GROUP TESTS AND THE STANFORD REVISION 

FRANKS. FREEMAN 

Siipt. of Education, State Infirmary, Tewksbury, Mass. 

The accompanying table shows the intelligence quotients obtained 
by the use of the DeaTboxn Group Tests and the Stanford Revision of 
the Binet on the same group of children. The mental ages are not 
given here because they would contribute little, inasmuch as the tests 
were given at different periods, varying from one year to one and one 
half years. 

It is interesting to note that of the 75 cases here quoted 46 show 
the IQ's of the Dearborn tests to be higher than those of the Binet by 
from one to ten points, excluding a few exceptional cases which merit 
special consideration. There are five cases which show no difference 
at all, and 24 which give a smaller Dearborn IQ. 

Even the same test given by two different examiners will generally 
show a small difference in the IQ because of the personal equation of 
the examiner, the attitude and the general condition of the child on 
the different days. This factor will account for the differences of 
most of the cases. The preponderance of larger Dearborn IQ's, as 
well as the fact that it has been observed that group test IQ's are 
generally slightly higher than those of individual examinations, ia 
believed to be due to the greater language element in the latter type. 

Now cases 70, 71, 72, 73, 74, and 75 show fairly large differences- 
and need explanation, although there are only two (Nos. 70 and 74) 
which show such variations that, the Dearborn IQ's place the girls in 
one category and the Binet IQ’s place them in an entirely different 
one. Number 70 is that of a girl whose hearing is defective, and 
although the Binet examiner was quite likely aware of this fact It is 
not felt that the handicap was entirely overcome. In giving the 
. Dearborn test the giiTsJ teacher repeated the questions for her par¬ 
ticular benefit and in the manner to which she was accustomed to 
hearing her directions. Furthermore this pupil’s school work bears 
out the mental age of 6-9 obtained with the group tests. 

Numbers 71, 72, 73, and 76 show variations which can be explained 
by the language factor. These four girls had just begun their schooling 
at the time the Binet tests were given, and, therefore, had not the 
language ability which is developed through schooling and which is a 
rather essential factor in the Binet. It is to be noted, however, that in 
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none of these cases does the IQ of either test place the pupil in a category 
different from that designated by the other, 

Numbei 74 shows in the Dearborn tesfcs a mental age of 11-6 with 
an IQ of 82; in the Binet the same girl shows a mental age of 9-2 and 
an IQ of 57 — a considerable discrepancy. This pupil was doing quite 
satisfactory Grade V work, so that it is felt the mental age of U-6 
obtained with the group tests is more nearly correct that that of 9-2. 
This large difference is probably due to the new surroundings under 
which the Binet was given, a strange examiner, and an unwillingness 
to extend the maximum of cooperation during the examination, It 
is hardly likely that there has been a special deterioration in the lost 
year because the girl is still doing essentially the same grade of work, 
both class room and manual 

No general conclusions can be based upon these 75 comparisons, 
bub they do offer a serious case for the advisability of using group 
tests even among low grade mentalities, as these obviously are. 
Group tests among this type of child have their advantages, among 
them being the vastly reduced nervousness,—the greater ease with the 
child works when not under such very close scrutiny as in individual 
tests. This, it has been observed, is a factor of considerable impor¬ 
tance. 

It has been said by some that for low mentalities group teats are 
not satisfactory, but after examining 250 mentally defective children 
with such testa, and finding a high correlation with actual school 
ability it appears that they may well be used among sub-normals as 
well as among normals. 

Not only does this comparison indicate a high correlation of the 
Dearborn group tests with the individual test, but it contributes 
towards the belief in the constancy of IQ's among low grade mentali¬ 
ties, the average difference here being for all coses 4.6 points; for the 
positive 5.6 points, and for the negative 3.7 points. 
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Table 1 


Cage No. 

Dearborn 

IQ 

Binot 

IQ 

Differ¬ 

ence 

Cage No. 

D&arbom 

IQ 

Binet 

IQ 

Differ¬ 

ence 

1 

31 

34 


39 


63 

8 

2 

62 

48 


40 


80 

' 6 

3 

64 

46 

8 

41 


67 

5 

4 

62 

66 

-4 

42 

72 

75 

--3 

6 

50 

53 

-3 

' 43 

64 

58 

6 

6 

55 

52 

3 

44 

81 

76 

6 

7 

48 

44 


45 

61 

60 

2 

8 

63 

49 


46 

78 

76 

2 

9 

47 

45 


47 

70 

71 

-1 

10 

67 

06 

I -8 

48 

82 

88 

-6 

11 

48 

50 


49 

74 

85 

-11 

12 

67 

63 


60 

06 

68 

-8 

13 

79 

80 


61 

69 

63 

6 

14 

69 

64 

6 

52 

70 

CO 

4 

16 

72 

67 

6 

63 

64 

66 

-2 

10 

60 

58 

2 

64 

54 

51 

8 

17 

62 

56 

6 

56 

60 

SO 

0 

18 

75 

70 

6 

60 

61 

60 

1 

19 

67 

68 


67 

57 

62 

6 

20 

61 

66 1 


68 

67 

67 

0 

21 

62 1 

51 


69 

67 

61 

6 

22 

68 ' 

08 


00 

66 

60 

5 

23 

91 

87 


61 

71 

71 

0 

24 

81 

77 


62 

70 

66 


26 

74 

70 


63 

67 

62 


26 

72 

76 

-3 

64 

42 

44 


27 

70 

81 

-6 

05 

47 

47 

0 

28 

60 

68 

2 

66 

71 

65 

6 

29 

77 

74 

3 

67 

51 

52 

-1 

SO 

75 

77 

-2 

68 

64 

62 

2 

31 

70 

63 

7 

09 

89 

46 

-7 

32 

00 

54 

0 

70 

62 

46 

17 

33 

70 

72 


71 

50 

40 

10 

34 

67 

68 


72 

67 

47 

10 

86 

70 

61 

9 

73 

63 

61 

12 

30 

40 

60 

-4 

74 

82 

67 

25 

37 

70 

67 

3 

75 

51 

42 

10 

38 

46 

63 

-8 





















NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 
EDUCATION 


DEPARTMENT IN CHARGE OF LAURA ZIRBES» 

1. Mechanical Ahililp Measured .—Bata and meaam'ing instruments 
mads accessible by such advances into the realm of mental measure¬ 
ment as the new Stenquist tests,^ may lead us to reconsider and revise 
current definitions of general intelligence and recognize the specific 
nature of mental endowment as clearly as we have come to recognize 
the need for apecifio training. Extended experimentation has 
resulted in three series of assembling tests involving the use of numer¬ 
ous actual models, and two tests of moohanical aptitude in which 
pictures are used. All the tests arc designed for use with gi'oups or 
olassea. The McCall method was used in scaling and eliminating 
elements. The test results were not only oorrelatecl with other criteria 
of meohanioal ability but also with the pooled results of six group 
intelligence tests. While the data are amply indicative of the signi¬ 
ficance and validity of the new measures, the correlations with so called 
" general intelligence” ate so low that they exhibit tho inadequacy and 
injustice of mental classifications based on the solo use of instruments 
which disregard this field of mental activity. 

In the Stenquist tests the difference between the median score 
(or norm) of adult (army) men and that of fifteen-year-old boys, is 
greater than the difference between medians (or norms of adjacent age 
groups between twelve and fourteen (boys). A few partial records 
for girls and adult women show further differences which make one 
wonder to what extent differences in opportunity, incidental training 
or restricted experienco are reflected in the performance of individuals 
of either sex. To reduce the effects of such variable factors if for no 
other reason, would it not be feasible to devise and standardize a 
battery of tests worked out on the basis of a definition of intelligence as 
aptitude in learning, or relative ability to profit by controlled learning 

* All unsipied reviewa ’were prepared by Laura Zirbea. 

* Stenquist, J. L.: Measurements of Mechanical Ability. Now York, Teachers 
College, Columbia TJniveiaity, Contributions to Education No. 130,1923, pp. ix + 
101 . 
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experiences? A study of the variations in the differences between 
“before and after” measurements or of variations in the amount of 
controlled practice required to roach a given degree of proficiency might 
furnish a significant criterion or index of innate capacity or aptitude. 
Meanwhile, a realization that the scores reflect training ns well 
as capacity should temper criticism and influence interpretations. 

2. OfWiai Use Are CommonPeople? —^This question is the captious 
title^ of a recent addition to the controversial literature on the topic 
“Democracy and the IQ.” A perusal of the chapter headings will 
suffice to interest any student of social or educational psychology. 
Those who are acquainted with this author’s critical propensities from 
other writings over his signature or nom do plume will not wish to miss 
his discussion of such topics as the following: “Education and Com¬ 
monalty, Mutinous ^Minorities, Vanity of Manners, Philosophy and 
Mediocrity, Reproducing Their Kind, Democracy in Practice.” 
According to the Prophet “Ezekiel,” profound pessimism as to the 
soundness of democratic institutions and ideals is not warranted, at 
least not while the “mugwumps have nothing better than democracy 
to offer America.” From the concluding chapter which, by the way, 
is written in true apocalyptic style, mixed metaphors modernized, but 
otherwise unabated we quote the following choice bits (pp. 266-246). 

“VisionaJ’ies see reflected, by mhage, the perfect state and reach 
out for it just as a baby stretches forth his hands for the moon. . . 
Man, to inhabit Utopia would need to check his human element on 
entrance . . .Moreover, man, despite his physical, mental and 
spiritual limitations, is capable of developing democracy so that the 
government of the United States as it exists today will seem as differ¬ 
ent from what it will then be, as the treadmill is from the dynamo 
. , . The appeal of target practice lies largely in the difficulty of 
making a high score. . . The ideal of democracy may be regarded as 
the bull’s eye of a target . . . The bull’s eye of democracy is a 
government in which the interests of all the people are pooled so as to 
create a community of interests and, in. tiu'ii, the acceptance of this 
community of interest as a touchstone by which to test all govern¬ 
mental activities.” 

3. Two Texts on M&nial Tesfs.—Although these two volumes are 
on the same subject they are surprisingly dissimilar. Dickson’s® 

* Buchholz, Heinrich E. (Ezekiel Cheevor): “Of What Use are Common 
People? A Study in Democracy.” Baltimore, Warwick & York, Inc., 1023. 

^Dickson, V. E.: “Mental Tests and the Classroom Teacher." Yonkers-on- 
Hndson, Now York, World Book Company, 1023, pp. xiv + 231, 
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intensely concrete discussion and his numeious practical suggestioria' 
show that the work is an outgrowth of experimentation and varied 
and extensive experience with mentot tests. His solutions of a host oi: 
specific problems which ho raises ai'O significant because they havil 
already been put to pragmatic tests in tlio large school systoms from > 
which most of the statistical data were secured. Well selected and' 
readable tables, charts and figures should wither pTejudioe and dispose; 
the reader to take move kindly to the somewhat didactic prescription, 
and authoritative tone. The concise sumraavics, selected references, 
quotations from case studies, analytical outline, the subheadings, 
paragraph heads and also the index add to the pedagogical value of the 
book as a text and should recommcnrl it to those who need a means of / 
securing enlij^tened teacher cooperation in working out pertinent poll* 
ciea in school systems. Terman’s editorial introduction recognizes 
the qualifications of the author in this respect. 

The Hines monograph,' edited by Suzzalo, is much briefer, more 
discursive, contains only general references to investigators, with no 
quantitative data but instead, numerous references to and quotations 
from the writinp of psyohoiogists and publicists who have partici¬ 
pated in the oonstruction and criticism of mental tests. This text U 
moi'o historical and informative. Those who have no practical intei'cst 
or need for more precise data will perhaps get from this bnofer analysis 
the necessary background for a more intelligent grasp of the debatable 
points involved in current discussions of the subject. A comparison 
of the outlines of the Dickson and Hines texts will make these differ¬ 
ences stand out clearly and help those who must make a choice. But 
laymen who intend to contribute to the discussion will do well to read 
Hines, study Dickson and ponder well over the references in both books 
before they take their pens in hand. 

4. Verhalism and i<'ortnali8ia versus Vitalized Instruction, —In this 
little three chapter monograph* the author exhibits instances to 
support the contention that more than 60 per cent of our teaching 
results in a relatively meaningless repetition of words. He analyzes 
the causes of over-verbalism and examines in detail the significance 
of each of the following contiibuting factors: (1) The isolation of the 
school, (3) the symbolic nature of the curriculum, (3) the passivity of 

> Hines, H. D.: "Measuring Intelligence." Boston, Houghton Mifflin Com¬ 
pany, 1923, pp. xii -h 146. 

^Ruediger, W. G.: ^‘Vitalized Teaohing.” Boston, Houghton Mifflin Com¬ 
pany, 1923, pp. xiii -{- llO. 
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the child, (4) the limitations of the teacher. He mentions the com¬ 
plications caused by overcrowded curricula and the dominance of tho 
doctrine of formal discipline, and then proceeds, first to a statement of 
the principles which underlie the solution of the problem and then an 
alignment of practical ways and means based on sound principles. 
In passing, he criticizes a variety of cure-alls and shibboleths, and calls 
attention to the place of sound psychology and pedagogy in a critical 
evaluation of means and methods of exhibiting subject matter aud 
enlisting pupil activity. 

This book should prove valuable in the training of teachers, first 
because it is constructive notwithstanding its critical tenor, and second 
because the content is so lucid aud well-organized that it should be 
readily assimilated. The appended outline should also prove exceed¬ 
ingly helpful to student and instructor. 

h. Menial and Educational Tests in Higher Education .—Here we 
have a book^ (1) a description of “New Plan of Admission,” adopted 
by Columbia University in 1919, and a critical discussion of the Thorn¬ 
dike Examination and data secured by its use and (2) an analysis of 
the factors of college success and of the principles of educational 
measurement upon which new types of content examinations in 
college subjects have been constructed. A large number of sample 
examinations included Illustrate the application of these principles, 
the variety of technics used, and the nature of the data obtainable 
through the use of such tests in Physics, Zoology, Government, 
History, Economics, Philosophy, Architecture, Hydraulics and 
English. 

The college entrance board examination in plane geometry is 
criticized and an array of sample questions for a more valid examina¬ 
tion in Geometry is submitted for consideration. In the appendices 
are sample questions from the Thorndike examination, data on its 
separate .parts, and suggestions for its improvement. Instead of 
merely writing the usual brief editorial introduction, Dr. Terman has 
contributed the first chapter of the book in which he challenges 
institutions of higher learning to undertake a serious type of personnel 
research and outlines the functions of a bureau charged with that task. 

6. On the Psychology of Early Childhood .—This book by an English 
author'^ will prove helpful to parents and others charged with the 

‘Wood, B. D.-. “ Meaautement in Higher Education.” Yonkers-on-Hudson, 
New York, World Book Company, 1923, pp. xi -|- 337. 

2 Drummond, M.: "Some Contributions to Child Psychology.'’ New York, 
Longmans, Green and Company, 1023, pp. viii + 161. 
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AN EXPERIMENTAL STUDY OP THE GROWTH 
OP SOCIAL PERCEPTION' 

GEORGINA STRICKLAND GATES 
Barnard College, Columbia University 

Ability to meet successfully social situations (social intelligence 
as it is sometimes called) depends, in part, on the capacity to make the 
appropriate tactful, courteous, aggressive, etc., reaction and, in part, 
on the ability to perceive accurately the conditions which are 
encountered. Of the latter requirement, adequate interpretation of 
the facial expressions of others forms an important ingredient. 

That this ability increases with age is obvious. A very young 
child may mistake friendly behavior for threatening, or misinterpret 
fear as anger, or fail entirely to note signs of nervousness or excitement 
in those around him. An adult may be able to detect very slight 
evidence of chagrin or distaste or amazement. 

It is the purpose of this paper to attempt a study of the genesis of 
the ability to interpret the emotional expressions of others. Photo¬ 
graphs were used as the material because of their availability and of 
the possibility of exact standardization here. Six pictures were chosen 
from a set published by Ruckmick,^ the basic of selection being the 
diversity of emotions represented and the relative ease of interpreta¬ 
tion by adults, as determined by the performance of the individuals 
in Ruckmick's experiment, and by the judgment of 10 individuals in an 
advanced psychology class tested by the present author. Picture 
A (Plate I) shows a laughing face expressing joy, mirth, amusement; 
Picture B is posed for pain; Picture C expresses anger or defiance 

1 The experimental work of this study was performed by Iloleu Goldatone, 
Margaret Miller and Mary E. Langton. Misa Goldstone is responsible for the 
testing of Group 11, Miss Miller of Group I, and Miss Laiigton of Group; HI. 

« Ruckmick, C. A.: Psychological Monographs No. 130,1921, pp. 30-35. 
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Picture D shows fear or horror; Picture E is scorn, contempt or disdain; 
and Picture F represents surprise, wonder or amazement. 

The pictures were shown singly and individually to 458 children 
ranging in ages from 3 to 14, and varying greatly in social statua.- 
Three groups of children were tested; Group I was composed of 79 
individuals from 3 to 6 years old, from the Kindergarten of a select 
private school; Group II of 266 children from the Kindergarten, first, 
second, third and fourth grades of a public scliool in a poor neighbor¬ 
hood of New York City, of which Irish, Jewish, and Italian children 
formed the majority; and Group III was made up of 30 children from 
a Settlement House in New York City, and 84 from the fifth and sixth 
grades of a public school which children of average social status 
attended. Table I shows the distribution of ages. 

The child was asked in the case of each picture, “What is this 
lady doing?” and if this brought no response, or an ambiguous one, 
“What is the lady thinking about?” or “How does she feel?” or the 
subject was simply urged to tell more about it. Not mere understand* 
vng of the emotion then, but ability to convey the interpretation in 
some way to others was required. The response was in each case, cl 
course, recorded verbatim. 

The pictures were also presented to the members of an introductory 
class in psychology, in an attempt to obtain an adult standard ol 
comparison. The purpose of the experiment was explained to thest 
subjects, and they were asked to write down, in each case, the emotioi 
or feeling represented. 


Tabls L—Nn^rDSR or Chiwbbn TtarsD in VAnmus Aon Gaours 


Age at lost birthday 

1 

3 

4 

1 ^ 

6 

B 

1 8 

, 0 

10 

11 

12 



Group 1. 1 

10 

I 26 

32 

11 




■ 





Group II. 


14 

63 

47 

60 j 

62 

31 

D 


4 ' 

3 


Group III. 







8 ' 

m 

40 

23 

14 

8 

, Total...... 

10 

40 1 

85 

68 

60 

62 

30 

23 

44 





Table I gives the results of the adult interpretations. A very clos 
agreement is evident. If we score the responses somewhat liberally 
disregarding differences in vocabulary, in understanding of the instvuc 
tions, and variations in degree or in minor accompaniments of tb 




































Growth of Social PenepUon 


461 





■■ i 




t'* 












452 


The Journal of 'Educational Psychology 


emotion interpreted, we may say that all the subjects successfully 
described Pictures A and E, that 98 per cent were correct in thoir' 
interpretations of D, 95 per cent in C, 89 per cent in B and 84 pef 
cent in P. Evidently these observers agree most closely on laughter 
and scorn, next on fear, next on anger, and least ■well on pnin and 
surprise. 

Lack of space prevents the printing of the responses made by each 
child to each picture. The intorpi-etations made by Group I (the 
Kindergarten class of a private school) to Pictures A, C and F are, 
however, recorded in Table I to Ulustmlc the kind of response obtained 
from the yo-unger children. 

Difficulties much more marked tlmu in the case of adults occur 
when an attempt is made to evaluate the replies. Our standards 
were exceedingly liberal, any interpretation which expressed the 
general trend of feeling in the picture iww counted as correct. In the 
first picture, such responses as “She’s laughing,” “happy,” “smiling,” 
“grinning,” “looking at something nice,” “thinking about her cute 
little baby,” were obviously correct, and mere descriptions of the 
features (“showing her teetli,” “cleaning her teeth,” “patting her 
teeth in”) such replies as “making faces” and the occasional “I 
don’t know” or refusals to answer wore, of course, unsuccessful 
responses. In the second pictm-e, “eying,” “hollering,” “havingbad 
dreams,” “tooth-ache,” “sees sora^hiag she doesn't like,” “imd,” 
“sick,” “sad;” “pleading,” “Ow,” “she's cross,” “she’s frightened," 
were considered good interpretations, and “sneezing,” “making a 
face,” “looking up in the sky,” “looking at something,” “shouting,” 
“showing her teeth,” “singing,” "staring,” were expressions which 
did not seem to describe the emotion repr^ented. 

Piotures 0, D, E and F, required a more specific interpretation, 
not merely pleasant or unpleasant feeling tone must be described, 
but a suggestion of the anger, fear, scorn, or surprise and wonder 
represented. For Picture C we counted as correct, “mad,” “looking 
mean,” “cross,” “angry,” “scolding,” “looks furious,” “like killing 
some one," “fighting,” and as incorrect such interpretations as “laugh¬ 
ing,” “smiling,” “Beared,” “singing,” “sad,” “coughing,” “amazed," 
“crying,” “making a funny face,” “pouting,” and even such partly 
correct expressions as "looks serious,” “sees something bad,” “frown¬ 
ing,” “crying,” “feeling badly.” The pictiu’o by which fear is repre¬ 
sented was reported correctiy as “frightened,” “scared,” “thinking 
about a wild animal,” “looks as if she saw something terrible,” 
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"horrified,” and incorrectly as "laughing/' "looking sad,” "cross,” 
"mad,” "playing the piano/* “scolding,” "opening her mouth,” 
"making a face/’ "sore," "surprised," "sick,” "crying,” etc. Scorn 
or contempt which was much more difficult for the children, yielded 
some replies which we might consider correct, as, "don’t like anybody,” 
"doesn’t think much of a person,” “don’t care,” "sees something that 
isn’t nice,” "disagreeable proud/’ “scornful,” "disgusted,” "jealous,” 
"she’s a tough one,” "Icnows it all,” "shows off,” and obviously inade¬ 
quate expressions as "dressing,” "looking at something,” "looking 
nice,” "silly,” "laughing,” "likes somebody,” "looks aside,” "sorry,” 
"happy,” "miserable,” "pensive” "mad,” “crying.” “Surprise,” 
"wondering," "astonished,” "staring,” "something suddenly came,” 
"saw something,” wore correct interpretations of the last picture, and 
"washing teeth,” "making a face,” "scolding,” "hollering,” "pray¬ 
ing,” "breathing,” "simling,” "looking wise,” "lookingstern,” "sick,” 
"posing,” "just over a fit;” “not well,” "angry,” and replies which 
denoted fear alone witliout the element of surprise, were among the 
incorrect responses. 

It is very easy to quarrel with these classifications or to question 
the individual interpretation of replies made by the exaininers. In 
the case of Picture B, for example, though the original intention was 
to limit the acceptable responses to pain, sadness, grief, crying, and tho 
like, so many of the older children interpreted this as anger or fear, 
that it was decided to add these to tho list of successfuL responses. 
For the picture of scorn, the reply "jealous” frequently given, was 
after some debate, admitted as a correct reaction. A large number 
of cases occur where the proper scoring of individual responses was 
somewhat doubtful. 

Taking the scores then, for what they are worth, and realizing the 
difficulties of interpretation which the questioner often mot, we find, 
according to Tables IV and V, a gradual increase in ability to interpret 
each picture as wo pass from the youngest to the oldest children. 
Occasional variations occur which we may and will explain, as due to 
differences in number or ability of the age groups. The general trend 
is upward. Laughter is understood by more tlian half of the children 
whose age at the last birthday was three; pain is correctly interpreted 
by more than half of those from ages 6 yeara 0 months to 6 years 11 
months; anger is understood at 7; feai* at 10; surprise at 11; and scorn 
is described by only 43 per cent of the 11-ycar-old children, though all 
the adults in the superior gi’oup tested, understood this picture. 
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When we group the children by grades (Table VI), we find, if we 
use again the 50 per cent mark as our measure, tliat more than half of 
the Kindergarten children understood the laughter picture, more than 
half of Grade I the pain picture, Grade III anger, Grade IV fear, 
Grade VI surprise or wonder-, and the portrayal of scorn is understood 
by 43 per cent of Grade V. 

The evidence from this testing shows, then, that the probable 
order of difficulty for the pictures (from least to greatest) is for children, 
laughter, pain, anger, feai', surprise, and scorn. This differs from that 
found for adults (Table H) where the order is laughter and scorn 
(100 per cent) fear, anger, pain, and surprise. The greatest discrep¬ 
ancy occurs in the case of scorn which is easy for the superior adult 
and extremely difficult for the child, and pain, which the child readily 
interprets and which the adults seem to feel to be inadequately 
represented. 

It may be that differences occur which are due to the social status 
of the individual tested. We might expect that the children from the 
private school would do better because they are, in general, brighter, 
and because their vocabulary is presumably more extensive; or we 
might expect that the children from poorer neighborhoods, of foreign 
parentage would have more opportunity to observe expressions of 
emotions both at home and in the movies which they attend more 
frequently than their more wealthy competitors, and go might show 
greater ability along this line. An examination of Table IV shows that 
when we compare children of like ages in Groups I and II, that Group 
II is better in interpreting the pain picture and Group I slightly better 
in interpretations of the other five. The small size of the differences 
and the fact that different examiners questioned the two groups, 
should be taken into account when the statement is made that in 
general the children from the private school did better. 

Sex differences, if they exist, should be described here ns elsewhere. 
Perhaps the superior “social tact” attributed to women is due to 
better ability to interpret the emotional expressions of others. A 
tabulation according to sex and age shows no differences. At the 
ages of four, five, and nine, the girls are slightly superior; at five, six, 
seven and eight, the boys. As the differences are small as well as 
inconsistent, chance or other factors (as differences in intelligence) 
may be invoked to explain them. There appears to be no difference 
in the reactions of the two sexes to the various pictures except in the 
case of the representation of fear where the boys are always superior. 
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Table III. —Individual Responses Made to PiCTona A (Laughteu) dy Gnoup I 


Kesponso 

Number of times given by children of each oge 

3 yr. 

4yr. 

!S yr. 

6 yr. 

Laughing. 

G 

18 

23 

S 

Smiling. 

1 

2 

1 

1 

Putting teeth togethei’. 

2 




Putting teeth in. 

1 




Rough.. 


1 



Showing teotli... 


3 



No response. 


I 

1 


Making faces... 


1 



Gvinning. 



1 


Looking at aomothing. 



1 


Tipping over head.... 



1 


Looking at- her teeth. 



1 


Thinking of happy things. 



1 


Thinking of aojiietiiing nice. 



1 


Sees doggie.... 



1 


Having her picture taken. 




1 

Cleaning hcv tcetli. 




1 

Happy... 




1 

Sees something she likes. 


. . 


1 

Think in g obout h e r cu te li t tlo baby 

1 ■■ 



1 
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Table \ll,^{Continiied) 


Response 

Number of times given by children of each age 

3yr. 

4yr, 

5 yr. 

6yr. 

Mad. 

2 

4 

7 

2 

Looking mean. 

1 




Not laughing. 

1 




Opening her eyes. 

1 

1 

1 


Fiwning. 

1 

.. 

4 


"Eyes”. 


1 



Crying. 


1 

.. 

1 

Cross... 


0 

3 

1 

Making n funny face. 

.. 

5 

., 

1 

Nothing. 


1 



Sitting. 


1 



Walking. 


1 



Feeling badly. 



1 


Making n face. 



5 


Looking at a picture. 



1 


Pouting. 

" 


1 


Scolding. 



1 

1 

Looks serious. 



1 


Looks furious. 



1 


Looking. 



1 


Sees me. 



1 


Angry. 



1 


Sees bird.. . 



2 ■ 


Angry naughty lady. 




1 

Not happy. 




1 

Sees something bad. 




1 

Sees rat. 




1 

Thinking about something mad... 




1 

"I don’t know”.. 

4 

2 

1 

1 
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Tadle HL'^iCotUinued) 


Number oi times given by children of each age 


Hesponse 

3yr. 

4yr. 

6 yi'. 

6 yr. 

Staring. 

1 




Looks mean.- 

1 




Mouth open. 

2 

1 

4 


Not laughing.. 

1 




Nothing. 

1 

2 

1 


Something funny. 

1 




Looking at something.. 


2 

■ 4 


Just looking.. 


7 

2 

2 

Washing teeth.. 


1 





1 



Being friendly.. 


1 



Looks wild... 





Sitting. 





Walking..... 





Isn’t making a face. 


1 



Talking. 


1 

1 

2 

Cross.. 


1 

1 


Making a face....... 


1 

2 


Just oU alone. 



1 


Bogs trying to get in. 



1 


Sees animals... 



2 

2 

Scolding. 



1 

1 

Looks serious... 



1 


Showing teeth. 





Smiling.. 





Mod. 





Bound eyes. 



' 1 


Opening mouth and eyes. 





iPrightened. 





Hears something. 





Laughing.. 





Sees something she likes........ 



, 

1 

Surprised. 




1 

“I don't know". 

3 

4 

3 

1 
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Table IV.— Pehcentage of Sdccessfuij Resfonses at Varioos Ages, for 
Childbin in the Three Groups 
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Tauj^e V.—Percentage of Sucoessful Hesponses Fon Children of VARioug 
Ages, Co^P^NlNQ the Three Ghodfs 


Number of cases. 

10 

40 

85 

59 

55 

58 

39 

28 

44 

27 

17 

8 

36 

Ages. 

3 

4 

6 

M 

7 

8 


1 

11 

12 

13 

14 

Adult 

ficturo A.. 

.70 


.84 

iQ 


EE 

.95 


.98 

H 



100 

Picture B. 

.40 

.40 

.44 

Q 

.72 

.43 

■tl! 

,64 

.79 

.74 

.77 


SO 

Picture C. 

.30 


1 ^ 

iB 

.52 


.72 

.S2 

.82 

.93 


,87 

05 

Picture D ... 

1 ^ 

IQ 

,13 

.17 


.32 

.46 

.67 

.77 

.80 


.75 

OS 

Picture E. 

\m 

IQ 


Kg 

Q 



.42 

.43 


,18 

,26 

100 

Picture F. 

.10 

n 

M 


n 

1 

1 

■ 

.67 

.62 

,41 


84 


The vesuUa of tho experiment indicate in gonoval, then, the feasi¬ 
bility of investigating the growth of social perception. They point 
to age clifferonces in ability to interpret various emotional expressions, 
differences which we may attribute either to variations in truthfulness 
of the actual representations used, or to an actual change in capacity 
through which first Inughtcr, then pain, then angor, fear, surprise, 
and scorn are recogni?.cd, understood, and made capable of description. 
That variations in difficulty of the pictures is not the entire cause of the 
difl'erences is indicated by the results obtained from superior adults 
where the order of difficulty is not identical witli that for children. 
Sex differences are not manifest, and differences caused by social 
status are, if they occur, slight. 

It is possible that some use may be made of the age and grade 
norms of which these tables may be a starting point. We might say 
that inability to interpret correctly the pictures understood by more 
than half of the children of a given individuals age or grade, indicates 
a certain lack of social perception. Just what the significance of this 
incapacity is, cannot of course, bo determined until the results of the 
test are compared with the data obtained from othei’ measures. 
Correlations with social maturity (as judged by teachers) physical 
maturity, intelligence, etc. we now being prepared. Measurement 
by such a test of delinquents, and others whose social adjustment is 
obviously inadequate, has been suggested. 
























Table VI.—^Percentage of Succbsstul Responses Given by Variotts Grades 
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Gra.de VI 
(Group III) 
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Grade V 
(Group III) 
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(Group 11) 
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PERMANENCE OF HIGH SCHOOL LEARNING 

D. H. EHCENBERRY 

Graduate Student, Teachers College Columbia University 
I. Introductory 

The purpose of the present study was to determine as accurately 
■as possible the permanence of learning in certain subjects studied in 
high school but not continued in higher institutions. The subjects of 
the study were the members of two senior classes in Educational 
Psychology, one in Rutgers College and the other in the New Jersey 
College for Women. The class in Rutgers was composed of 16 men 
and the class in the College for Women of 18 women. 

Early in the fall semester a survey of the two classes was made 
to find out what subjects could be used in the study and the number of 
men and women available in each subject out of the total of 34 
students. The available subjects and the number of students for each 
are given in Table I below: 

Table I.— Suiijeots Studied in High School but not Continued m Colleqb 


Subject 

Sbucsters 

IN H. S. 

Men 

WOUBN 

TOTAI; 

Latin. 

2 

0 

1 

1 


4 

3 

2 

6 


6 

3 

1 

4 


8 

4 

2 

6 

Total. 


10 

6 

18 

Ancient History. 

1 

0 

1 

1 

2 

9 

7 

16 

Total. 


9 

8 

17 

United States History. 

1 

0 

2 

2 


2 

0 

7 

13 



_ 

— 

— 

Total. 


6 

g 

18 

Geometry. 

2 

2 

11 

13 


d 

3 

2 

5 

Total. 


6 

13 

18 

Physios. 

2 

7 

11 

18 

Chemistry. 

2 

8 

9 

17 


403 
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The complete plan for the study as finally fovimilatccl included the 
following: 

1. Rough determination of achievement in high school by securing 
final semester marks in each subject. 

2. Determination of present status in each aubjcct by use of 
standard tests. 

3. Rating of high school subjects as to intcr<*st manifested in them 
while in high school by use of a five point scale—A BODE. 

4. j\'Ica3ui'cmont of the intclligoncc of the stucloiits. 

With these data in hand it was possible to make the following 
epccific atndics: 

1. Comparison of scoi'cs earned on the standard tests with the 
marks earned while in high school. 

2. Comparison of the scorns earned on tim standard tests with 
norms for high school pui^ls. 

3. Correlation of intclUgonce with tost scores. 

4. Correlation of iutelligchce with school marks. 

5. Correlation of intelligence with interest in each subject. 

0. Correlation of test score.? witli school mark.?. 

7. Correlation of test scores with interest ratings. 

8. Correlation of school marks with interest ratings. 

9. Correlation of test scores with lapse of time. 

10. Sex differences. 

11. Comparison of subjects as to permanence of learning. 

The final semester marks werc secured from tlic principals of the 
high schools represented by the members of the two clas.'ses. A blank 
form for each student was sent to the principal who was asked to give 
the final semester mark, the dates the subjects were taken in high 
school, and the passing mark of the school at the time the marks were 
earned. These blanks were returned by all principals with all the 
data asked for. The passing mark in all schools was found to be 70. 

In detenniiung the present status of each student in the subjects 
not pursued since leaving high school it was decided to make use of the 
following tests; Henmon, Latin Test 1, Vocabulary and Sentences; 
Sackott, Ancient History Scale; Van Wageueu, American History) 
Information Scale A; Barr, Dingnoslic Tests in American History; 
Minnick, Geometry Test A; Starch, Physics Test; Boll, Cliemistiy 
Test. All of these tests were devised for high sclmol use except the 
Van Wagenen American History Scale and standard scores were found 
with which to compare the scores earned by the college students. The 
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Van Wagenen Test was given to higli school seniors in two New Jersey 
high schools as explained latei* for the purpose of securing scores with 
which to compare. All of the tests were given during November 
and December. The students \yei-e given no warning beforehand so it 
was not possible for them to cram quietdy on the subjects and thus 
earn a higher score than they otherwise would. 

In order to secure the interest ratings the students were asked to 
fill out in class a blank listing the subjects of the study, so arranged 
that they could chock their interest while in high school in columns 
headed ABODE. The insti'uctions for filling this blank were as 
follows: 

"In the proper space below please chock your interest, while in 
high school; in the respective subjects by checking in the proper column 
according to the following plan: If the subject was your favorite, 
oheok under column A; if not your favorite but above average in 
interest, chock under B; if only of average interest, check under C; if 
of less than average interest but not the subject of least interest, check 
under D; if the subject was detested, check under E. These interest 
ratings were secured before any of the tests were given in order that 
they should not be influenced by experiences with the test. In order 
to get a check on the reliability of the ratings the students were asked 
to rate Chemistry at the time the Bell Chemistry Test was taken. No 
mention was made of the former rating which was made about six 
weeks earlier. Of the seventeen students taking tlie Chemistry Test 
and rerating for interest there was found to be exact agreement with 
the first rating in all cases except one and in that case the change was 
one letter only. 

The intelligence of the students was measured by a battery of tliree 
standard tests: namely, Army Alpha, Test VI of the Bureau of 
Personnel Besearch commonly known as "Sorambled Alpha,” and the 
Terman Group Test of Mental Ability with time limits cut in two. 
The results of these three tests are given in Tables II, III, and IV. 
Table II shows the results for the sixteen men, giving the score earned 
on each test and the rank each man holds in his group. Table HI 
gives the same for the eighteen women and Table IV combines the two 
sexes. In these tables and all folloAving tables men are represented by 
single capital letters and women by double capitals. 

That the battery of tests slightly favored the men is indicated by 
the average scores in Tables II and III. For purposes of comparison 
these are given below: 
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AnurALPUA Teat VI Tejisiav Aveiiaob 

Men. 1V2.0 160 167.0 106 

Women. 100.8 160 168.0 101 

Bobh. 171.0 163 167.8 163 


TADjjE II.'—Intelligence Scores, Sixteen Men 




Scrambled 

Alpha 

Terman 

Average 

Score 


Score 

Ilftnlc 

Sooro 

Rank 

Score 

Rank 

A 

M 

7M 

n 

OH 

104 

11 


SH 

B 


3 


1 

182 

3H 

182 

2 

C 


7H 

■(9 

OH 

104 

11 


8H 

D 

168 

13 

103 

6 

lOB 

0 

163 


F 


1 

174 

2 

182 

3H 

183 

1 

F 


14 

127 

10 

133 

10 


16 

G 

los 

m 

130 

16 

172 

7 

R9 

111^ 

H 

108 

m 

107 

4 

160 

14 

9 

IIH 

I 

179 

0 

171 

3 

186 

2 


8 

J 

181 

4H 

156 

11 

187 

1 

176 

5 

. K 

106 

12 

158 

8 

104 

11 


iiH 

L 

ISL 

4H 

101 

7 

171 

8 

171 

0 

M 

192 

2 

162 

0 

178 

0 

177 

4 

N 

176 

0 

152 

12 

180 

5 

169 

7 

0 

155 

16 

143 

14 

164 

13 

161 

U 

P 

141 

10 

144 

13 

147 

15 

144 

15 

Average. 

173 


160 


107.0 


160 



Correlation op Scores (Spearman’s Footrulk Method) 


Army Alpha and Scrambled Alpha.664 

Army Alpha and Ternian. 711 

Army Alpha, and Average...926 

Scrambled Alpha and Terman.468 

Scrambled Alpha and Average...677 

Terman and Average.818 


This comparison shows that the women are three points behind 
the men in Array Alpha, six points behind in Test VI and one point 
ahead In the Terman Test« Judging from tho results of the Terman 
Test it would seem that the women are equal in ability to tho men. 
In all likelihood the somewhat poorer results on Army Alpha and Test 
VI are due to the fact that these tests were devised primarily for men. 
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No attempt was made to reduce the intelligence scores to mental 
ages or intelligence quotients as they are used only in ranking the 
students as to mental ability. 

Coefficients of correlation were found by the Spearman Pootrule 
Method of each test with the two others and with the average of the 
three. These correlations are given below, Tables II, III and IV, and 
for purposes of comparison are brought together on page 469. 


Table III.—Im>Ei,LiaENOE Soobes, Eiohteen Wojien 


Pupil 

Army Alpha 

Scrambled 

Alpha 

Terman 

Average 



Score 

Bank 




QQ 

AA 

R H 

8 

164 

3 

HI 

3 

n 

m 

BB 

um 

1 

162 

8H 


8 

Dl 

4K 

CC 

ITH 

181^ 

146 

14M 

H B 

8 

Bnrfl 

12 

DD 

mm 

101^ 

119 

18 

BP fl 

17 

Bk'B 

18 

EE 

166 

0 

135 

16 

■ii ■ 

12 

164 

14 

PP 

164 

IIM 

169 

6 

171 

8 

166 

8H 

GG 

161 

15 

146 

UK 

132 

18 

146 

16 

HH 

169 

10H 

150 

10 

172 

6 

160 

11 

II 

164 

liH 

169 

2 

151 

13 

161 

10 

jj : 

187 

2 

163 

r 

183 

2 

174 

2 

KKj 

162 

13H 

148 

12 

141 

16 

160 

16 

LL 

179 

6 

171 

1 

177 

■1 

176 

1 

MM 

166 

10 

146 

13 

186 

■1 

166 

8K 

NN 

170 

6 

160 

6 

169 

11 

169 

6 

00 

ISO 

3 

163 

4 

174 

5 

172 

3 

PP 

179 

6 

149 

11 

170 

10 

166 

7 

QQ 

166 

IS 

126 

17 

143 

14 

142 

17 

BU 

176 

7 

152 

8H 

138 

16 

166 

13 

Average. 



160 


■ 


161 



Correlation of Scores (Spearman’s Footrtjle Method) 

r 


Army Alpha and Scrambled Alpha, 

Army Alpha and Tetman. 

Army Alpha and Average. 

Scrambled Alpha and Termoa. 

Scrambled Alpha and Average.... 
Terman and Average. 


.602 

.472 

.S36 

.486 

.810 

.742 
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COMPAMBON OF COIOIEI/ATIONS 
MbH 

Women 

Both Sexes 

Amy Alplia and Scmmbled Alpha. 

.654 

.602 

.618 

Amy Alpha and Termnn. 

.711 

,472 

.603 

Army Alpha and Avorago. 

.026 

.830 

.806 

Scrambled Alpha and Terman. 

.458 

.480 

.600 

Scmmbled Alpha and Average. 

.077 

.810 

.701 

Terman and Average... 

.818 

.742 

.844 


Table IV.— iNXELEiflENCE Scores, Men and Women 


Pupil 

Army Alpha 

Scrambled 

Alpha 

Terman 


Score 

Rank 

Score 

Rank 

Score 




A 


14 

wm 

16M 

164 

22 


16 

B 

188 1 

4 

Wm 

1 

182 

BH 


2 

C 


14 

IS 

15H 

104 

22 

■KM 

16 

D 

168 

30 

Tft3 

8H 

108 

20 

103 

20K 

E 

193 

1 

174 

2 

182 

BH 

183 

1 

r 

150 

31^ 

127 

32 

133 

38 

130 

83 

G 

108 

18J^ 

180 

81 

172 

12H 

108 

20H 

H 

168 

18H 

187 

0 

160 

27 

103 

20H 

I 

170 

lOH 

171 

3H 

180 

2 

170 

3 

J 

181 

OH 

160 

17 

187 

1 

176 

C 

K 

106 

20H 

168 

14 

164 

22 

103 

20H 

L 

181 

OH 

161 

11 

171 

16K 

171 

10 

M 

102 

2 

102 

10 

178 

0 

177 

4 

N 

175 

10 

162 

20 

180 

7 

160 

12K 

0 

165 

83 

148 

20 

164 

26 

161 

28 

P 

141 

34 

144 

28 

147 

28 

144 

31 

AA 

100 

17 

104 

7 

170 

8 

171 

10 

DB 

189 

3 

162 

20 

171 

nn 

171 

10 

CC 

102 

26H 

146 

20H 

171 

mi 

160 

26 

DD 

169 

28H 

110 

84 

137 

32 

138 

34 

EE 

160 

20H 

136 

80 

161 

24 

164 

27 

FE 

104 

23H 

160 

13 

171 

15H 

105 

17H 

GG 

101 

27 

146 

20H 

132 

84 

140 

30 

HH 

169 

2SH 

160 

22 

172 

12H 

100 

24 

n 

164 

23H 

' 100 

6 

161 

20 

161 

23 

JJ 

187 

5 

163 

18 

183 

■■ 

174 

7 

KK 

162 

25H 

148 

24 

141 

IB 

160 

20 

LL 

179 

lOH 

171 

8H 


!■ 

170 

6 

MM 

106 

22 

140 

26 

1 186 

8 

165 

17H 

NN 

179 

lOH 

160 

12 


10 

169 

12H 

00 

180 

8 

163 

8H 

174 

11 

172 

8 

PP 

170 

10^ 

140 

23 


18 

166 

16 

QQ 

166 

31>^ 

126 

33 

,143 

20 

142 

32 

ER 

176 

14 

162 

20 

138 

31 

166 

26 

Average. 

171 


163 


167.8 


163 
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COBRELATION OP ScOBES (SpeARSIAN's FOOTROEE MetHOO) 


I* 


Army Alpha and Scrambled Alpha.. 

.818 

Army Alpha and Terman. 


Army Alpha and Avcraffc. ... ana 

Scrambled Alpha and Torman. 


Scrambled Alpha and Average. 


Terman and Average. 



II. Permanence op High School Learning as Determined by 
Comparison of Scores Earned on Standard Tests with High 
School Marks and Norms for High School Pupils on! 

Standard Tests 

The first difficulty encountered in a study of this kind is that of 
determining with any degree of accuracy the amount of any subject 
ever known by the subjects of the study. Scientific accuracy would 
require that initial tests must have been given to all the students at 
the time they had completed the study of each subject in. high school. 
Lacking these initial tests the only thing we can do is to compare the 
results on the tests given with high school marks and with standard 
scores for high school pupils where such exist. Using McCall’s formula 
for a one group experimental problem: 

S , . . IT~EF~FT—C 

Where S is the experimental group, IT the initial test, EF the expert 
mental factor, FT the final test and C the change we have EF, the 
experimental factor, namely, the lapse of time since the studies were 
completed in high school and FT, the final tests. It is necessary for 
us then to assume from high school marks and standard scores what 
the IT in each case would have been had it been given to the students 
while in high school. 

The second difficulty in the study is that of determining the effect 
of incidental learning. In the case of each high school subject care 
was taken not to include any students who had continued the study 
in college. But this does not guaiantee that the subjects learned in 
high school have not been kept up more or less by incidental learning 
in other subjects and in ordinary ways out of classes. In the above 
formula C, the change between initial and final test is undoubtedly 
smaller on account of incidental learning than it would be if we could 
eliminate this factor. This seems to be especially true in the case 
of United States History as will be pointed out later. 
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Tadle V.—High School Mabks and Scohes on Standard Tests 


Pupil 

Latin 

Anoiont 

history 

U. S. history 

Qeomatry 

Physios 

Chemistry 

H.a. 

Test 

H. S. 

Teat 

n.B. 

Teat 

I 

Test 

II 

n.8. 

Teat 

11. e. 

Teat 

n.B. 

Test 

A 

00 


87 

206 








68 

66 

D 


.. 1 






88 

78 





C 

78 

mm 



,, 



70 

08 





D 



, 

1 • • 

,, 


, , 


s 1 • • 

00 

20 

87 

20 

E 

00 

so 



00 

70 

80 







F 

76 

34 

74 




, , 


.... 

86 

0 

00 

10 

O 

so 

44 

86 

234 

76 

72 

67 

70 

03 





H 



82 

118 


.... 

B 

B 

■HH 

78 

16 

74 

30 

I 

74 

48 




■Hill 

B 

H 

HM 

73 

Bl 

71 

24 

J 



04 

241 

03 

76 

B 

Hi 

m 





K 



.... 

. • « 

76 

40 

80 



06 

4 



L 

78 

42 

80 

160 

86 

00 

68 

84 

1 





M 

31 

86 

.... 





, , 


.. 


01 

44 

N 

74 

62 

74 

80 

70 

04 

03 

76 

1 60 





0 • 



.... 

• • • 


■HU 


, , 

■HH 

.. 

.... 

76 

40 

P 

76 

26 

02 

100 

•• 

Bj 

•• 


Hi 

80 

0 



ATcrase 

79 

as 

83M 

1C3 

82 

07 

60 

70 

74 


0.7 

82 

38 

AA 



.... 

. . • 

83 

61 

62 

, , ' 

. . ► ► 

B 

8 



BB . 

H 


.... 

. . i 

, , 

.... 


, , ' 

. . ► » 

mm 

30 



CO 

H 

• 1 1 

m 

134 

74 

18 

63 

80 

20 

iB 




DD 

H 

1 . . 

06 

107 ' 

06 

40 

47 

, , 

. . * • 

B 

» > . • 

06 

12 

EE 

80 { 

i 



B 

BOTI 

B 

73 

40 

74 1 

8 

73 

14 

FP 

73 

1 79 

.... 

!. • 

IB 

UIU 

B 

76 

61 





GO 

69 

Ea 

90 

139 

iBI 

■lUI 

B 

80 

60 





HH 

B 




01 

46 

63 

00 

60 

87 

11 i 

00 

16 

II 

H 

B 

84 

02 

,, 

• * e . 

* , 

70 

26 

» » 

s . • • 

73 

10 

JJ 

H 

B 

.... 


,, 

• • * » 


.. 


03 

24 



KK 

B 

B 

03 

120 

06 

43 

60 

00 

38 

05 

7 



LL 

02 

03 

. . . 

. . . 

, , 

. . . 


06 

37 





. MM 

02 

38 

88 

100 

87 

28 

66 


.... 

61 

18 



NN 

,, 

, , 

92 

187 

. , 

e • • • 


■ttl 

72 

84 

10 

01 

43 

00 





86 

61 

67 

77 

24 

73 

4 

78 

24 

PP 

78' 

10 


. . 4 

82 

63 

51 

85 

30 



70 

16 

QQ 

B 


90 

116 

88 

36 

62 

83 

88 



78 

25 

nn 


41 






■1 

03 

78 

8 

70 

20 

Average. 

64 

40 

80 

132 

80 

40H 

63 

84 

44.7 

84 

IfV! 

80 

20 

Btondard. 


100 


in 

•• 

mi 

48 

•• 


' • • 

1 

■ 

60 


Table V above shows tho marks earned in high sehool and Uio sooTfia on tho atnndard tests. 
The upper half of tho table Cotters A to P) includes tho slstcon mon; tho lower Iintf (lottors AA to 
AR) inoLudos tho oiglilcen women. Averngo high school marks and tost eooros aro givon for cnoh 
sex. Tho Latin sooios arc not the raw scoroe earned but roviacd soorcs as ozplninod on pngo 471. 
Soorca under Test I of United States History are on tho Van Wngonon Soalo and Boorca under 
Test II are on the Barr Test. 


Table V above gives the marks earned in higb. school in the six 
subjects of the study and the scores earned in the seven standard 
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tests used. ^ If high school marks were lughly reliable and meant the 
same thing in one subject as in another, and in one school as in another, 
it would be possible to take the distribution of marks in each subject 
and transmute into scores on the standard tests. It would then be 
possible to compare the transmuted marks with the scores earned on 
the standard tests and we would be able to compute the percentage of 
subject matter retained provided the standard tests were scaled from a 
reliable zero point. Con'elatioDS of high school marks were so low 
that no attempt was made to tr^smute the marks. The coefficients 
are given below to show why the high school marks could not be 
transmuted into scores: 


r 

Latin marks and scores on Hemnon Test...192 

Ancient History marks and scoros on Sockett Test.641 

U. S. History and scores on Van Wngenen Test.018 

17. S. History and scores on Barr Teat.192 

Geometry marks and scores on Minnick Test.176 

Physics marks and scores on. Starch Test.276 

Chemistry marks and scores on Bell Test..... . 269 


(Spearman’s Footrule method used in computing r's) 

The best comparisons we are able to make for purposes of determin¬ 
ing retention are between the scores earned on the standard testa and 
the standard scores for high school pupils in the same tests. It is 
unfair, however, to make the comparisons with the median scores of 
high school pupils as college seniors are a more highly selected group. 
In order to allow for selection we will probably not be far wrong if we 
make our comparisons with the upper half only of high school pupils. 
Our problem is then to compare the scores of college seniors with the 
average or median of the upper half of high school pupils. From the 
published results of the tests used in the study it was possible to get 
the 75 percentile only in the case of the Minnick Geometry Test. 
Results of the Bell Chemistry Test in fourteen Texas high schools 
give the 70 and 80 percentiles from which the 76 percentile may be 
estimated accurately enough for our purposes. The Van Wagenen 
Test was given to 97 pupils in the New Brunswick and Elizabeth high 
schools so it was possible to get the average ability of the upper half 
of the group. Table VI gives the standard score for each test used, 
the 75 percentile where it is known, the average scores of the college 
seniors and the median number of years since completing the study of 
the subject. 
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Tadle VI.—CoMPAnieoN of Restilts on SrANDAno Tests with Standard 
S coiiEs son Hiuk School PTiPiTiS and with Median Scotjes 
OF THE Uppeh Half 


Subject 

' 

Sfandnrd 

score 

76 Por- 
contilo 

Co 

Mon 

llcgo souk 

Wonioii 

rs 

Both 

Years 

tigo 

Latin. 

lOQ.Q 


68.0 

40.0 

64.0 

3'4 

Ancient History. 

171.0 

. t • 1 


132.0 

147.6 

5}i 

U, S. History (Van W,)..., 

5C.6 

62.0 

67.0 

40.5 

61.0 

3K 

U, S. History (Barr). 

43.0 

.... 

■Hillil 

53.0 

56.0 

3H 

Goometry. 

02.5 

72.9 

74.0 

44.7 

63.0 

i'A 

Physics...-. 

50.0 

■81 

9.7 

13.7 

12.2 


Chemistry. 

49.0 

m 

33.0 

20.0 

26.6 

4H 


Figaro 1 presents the same facta graphically. In all cases the 
standard score is assumed to be 100 and the score that porcontage of 
100 which the score in Table VI is of the standard in Table VI. 


, SUBJEOS 

Perod&taKe Retained 

10 20 90 40 50 SO 70 80 90 lOO 110 120 

Anolcnt History 

D.g.Hlgtory (Van} 

D.S.HiatoryfBarr) 

Oooiiiotry 
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Fra. 1. 


From Table VI and Fig. 1 it is seen that retention of high sciiool 
learning seeiiis to be highest in United States History, Ancient History, 
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and Geometry, and lowest in Physics, Chemistry, and Latin. As 
judged from test results it would seem that college seniors who have 
not studied United States History for three or four years possess 
almost as much as the average hi^i school pupils retain at tlio close 
of their study of the subject. On the Van Wagenen Test the median 
for 97 high school seniors who were just completing their study of 
United States History was 65, and the 75 percentile, 62. The average 
score of the men on this test was 07, of the women 401^, and for both 
sexes 51. If we assume that the ability of the college seniors, while 
in high school, is represented by the 75 percentile then our test 
results show the men to be five points above the 75 percentile and 
the women 21^ points below. Both sexes combined are 11 points 
below. On tlio Barr Test the standard is 43. The average of the 
men is 59, of the women 53, of both sexes 65. The 75 percentile is 
unknown, but it is probably not far if any above the scores of the men 
and women. 

The high retention found in the case of United States History is no 
doubt due very largely to the factor of incidental learning. Both 
in and out of school college students have numerous opportunities 
to keep the main facts of American History in mind. Courses in 
Economics, Sociology, American Literature, and the like in school, 
and newspaper and magazine reading outside of school, to say nothing 
of many other contributing factors, serve to keep the bonds strong 
enouglr for recall. The same thing is tine but to a lesser degree in 
Ancient History. College courses in Greek and Latin Literature, Art, 
Philosophy, and the like contribute much in keeping the chief facts 
from being forgotten. Tlie comparatively high retention in Geometry 
may be due partly to incidental learning in other mathematics courses 
but is poi'haps laagely due to the fact that not memory alone was 
needed to solve the problems of the t(»t but reasoning also. The 
Minnick Geometry Test A which was i^d in this study consists of 
five problems in construction. In each case the proposition is stated 
and tlie pupil is required to moke the drawing. It ia not a case of 
sheer memory, but of ability to reason out the answer. This same 
condition holds also in thecasoof the Barr Diagnostic Test in American 
History. It is so constructed as not to test information alono but to 
test also one’s ability to reason with the facts of history, On this 
test as Fig. 1 shows tlie college seniors far exceeded the norm for 
high school pupils. 
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As judged from test results college seniors retain about half of their 
high school Latin. But here again it is likely that incidental learning 
is an important factor. College studies in English and Romance 
Languages may serve to keep the essentials of Latin in mind. 

The much lower retention in the case of Physics and Chemistry 
is probably due to the fact that the students have had few if any 
opportunities to apply the knowledge or to review it since leaving 
high school. A great deal of science instruction is formal in which 
mastery of the text and perfoimanco of a required number of out and 
dried experiments constitute the sole i-equheracnts. Pupils too often 
do not learn the relationship between the facts of science and the 
world about them and as a consequence the larger part of the facts 
once known soon fade away, 

A detailed analysis was made of the right and wrong responses on 
the Henmon Teat, Van Wagenen Teat, Bell Test, Starch Test and the 
Saokett Test. Space does not permit giving these tabulations in full, 
but in Table VII are given representative samples from tho five teats. 

Table VII.— Numbee or Cokreot Besponses on Rephesentative Questions 


Teax AND No. or 

IN Frvn Standaud Tests 

SXATBMBNT 07 TUB QdEBTION, LiTEnALLV 

lOTAl. IN 

TOTAt, 

QtTSsrioN 

OK ScilB^ANTIAt.lit’ 

QllOUl* 

Rioiit 

Lalin 

t 

Henmon, Vooab. 
5 

Dico 

17 

17 

10 

Non 

17 

11 

16 

Spes 

17 

9 

20 

Vittus 

17 

10 

26 

Vita 

17 

14 

30 

Adyentus 

17 

4 

36 

Appello 

17 

IG 

40 

Nam 

17 

6 

45 

Turn 

17 

5 

60 

Quisqnc 

17 

1 

Henmon, Son ten cg 

6 His rebus cognitU, totua cxercitus prima luco 

discossenmt 

17 

4 

10 

Locus castris muniondia faoiUs nobis doligondiis 
esb 

17 

1 
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Table VII. Nuaibeii op Cobbect Responses on Repubsentatipb Queskons 
IN Five Stanbabd Tests —(Continued) 


Test and No. op Statement op tub QacsTioNr LjtebaI/Lt 
Question or SiniaTAHTCALLT 

American History 
Van Wagenen 


ToTAi-iif Total 
GnODP RtQHT 


6 By what people was our Thauksgiving Day 

custom started? 

12 What was Henry Hudson looking for when he 
sailed up the Hudson River? 

17 What group of Indian Ttihes lived in Western 
New York? 

27d Who was president of the Southern Con¬ 
federacy? 

27t Wiio secured adoption of the Missouri Com¬ 
promise? 

27j Who was fcho first Cluef Justice of tha Supreme 
Court? 

34a Who was president during the Mexican War? 

84c Who was president when Florida was 

purohoeed? 

Ancient History 
Snokatt 

I 1 What was Hannibal noted for? 

7 What was Attlla noted for? 

II 1 Name a man noted aa an orator 

4 Name a man noted as an historian 

III 1 Historical significance of battle of Tours 

4 Historical sigaificance of Council of Nioaea 
10 Historical significance of Peloponnesian War 

V 1 Approximate dote of fall of Rome 
6 Approximate date of Hejira 

10 Approjumate date of establishment of Roman 
Empiro 

VI 1 Contribution of Greeks to civilization 

3 Contribution of Phoenicians to civilization 

5 Contribution of Romans to civilization 

6 Contribution of Hebrews to civilization 

Chemistry 
Bell Test 

6 Two substances used in making hydrogen 
10 Two gases used to bleach cloth or flowers 
16 Two commercial uses of ammonia 

20 Commoroiol ttses of nitrates 

26 How many grams of water are formed by the 

combustion of 26 grams of hydrogen in air 


16 

6 

15 

6 

15 

g 

16 

7 

16 

2 

16 
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15 

0 

16 

2 


17 

11 

17 

10 

17 

16 

17 

8 

17 

6 

17 

3 

17 

2 

17 

6 

17 

0 

17 

2 

17 

17 

17 

6 

17 

17 

17 

12 


16 

3 

16 

6 

10 

0 

18 

10 

16 

0 
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III. Correlations of Intei/Ligbncb with Test Scores, School 
Marks, and Interest in School StrojECTS 


(i4) Correlation of Inielligence with Test Scores. 


Subject 

No, Cabeb 

r 

Table 

Latin . 


.480 

2 

Ancient History . 


.323 

11 

U. S. History (Van W.) . 

. 16 

.790 

20 

U. S. History (Barr) . 

. 16 

.827 

20 

Geometry . 

..... 18 

.242 

36 

Physics . 

. 16 

.486 

46 

Chemistry. 

. 16 

.628 

64 

Average of r’s. 


.626 


(B) Correlation of Intelligence mih Sdiool Marks. 


Subject 

No. Cabeb 

r 

Table 

Latin .... 

.. 17 

.000 

8 

Ancient History . 

. 16 

-.089 

12 

U. S. History . 


-.064 

21 

Geometry . 

. 18 

-.018 

37 

Physios . 

. 16 

.000 

46 

Chemistry .-. 

. 16 

-.071 

66 

Average of r's ... 


-.030 


(C) Correlation of InieUigence and Inletesi. 



Subject 

No. Cases 

r 

Table 

Latin . 

. 17 

.694 

6 

Ancient History . 

. 16 

.000 

14 

U. S. History . 


.816 

23 

Geometry. 

. 18 

.610 

39 

Physios . 

. 16 

.662 

48 

Chemistry . 

. 16 

.800 

67 

Average of r's . 


.690 



It is evident that there is a substantial positive correlation between 
intelligence and test scores and a positive correlation between intelli¬ 
gence and interest in school subjects. The correlations in the latter 
case were computed by the Pearson cos w method, This is a very 
rough method as it does not take account of relative position. 
Furthermore, as is explained in connection with Table V it was neces¬ 
sary to place the C ratings above or below the average on a rather 
arbitrary basis. The correlations appearing under III C, then, must 
not be regarded as true correlationB but as merely indicative of a 
positive or negative relationship. Table III B shows no correlation 
(-.039) between intelligence and school marks. This is just what 
we should expect considering the fact that the subjects of the study 
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come from thirty different high schools. Had they ail come from the 
same high school we sould expect a small positive correlation. The 
result which we find is simply another evidence of the lack of standard¬ 
ization of school marks. 

IV. CORUBLATIONS OP TeST ScoRES WITH ScHOOL MaRKS, toEREBT 
IN School Subjects, Lapse op Time, and School Marks 

WITH Interest 


(A) Correlation of Test Scores with School Marks, 


SuDJi;CT 

No, Cases 

r 

Tablb 

Latin . 

. 17 

.192 

4 

Ancient History. 

. 16 

.541 

18 

U, 8. History (VanW.). 

. 16 

.018 

22 

U. S. History (Barr). 

. 16 

.192 

30 

Goomotry. 

. 18 

-.176 

38 

Physics . 

. 16 

.276 

47 

Chomistry.,.. .. ... 

. 16 

,259 

66 

Average of /s. 


.186 


{B) Correlation of Test Scores with Interest. 



SonjECT 

No, Cases 

r 

Tadlb 

Latin . 

. 17 

.876 

6 

Ancient History . 

. 16 

.409 

16 

U. S. History (Van W.) . 

. 16 

.821 

24 

U. S. History (Barr). 

. 16 

.749 

31 

Geometry. 


.610 

40 

Physics . 

. 10 

,860 

49 

Chemistry,....... 


,860 

68 

Average of r's . 


,740 



(C) Correlation of Test Scores with Lapse of Time Since Subjects 
were Studied in High Schol, —The tabulations which deal with this 
matter do not lend themselves to statistical treatment for correlations. 
Inspection of the tables shows no relationship between test scores and 
time since subjects were studied. Differences of one, two, or three 
years do not seem to affect the score which will be made on a standard 


test. 


Subject 

No. Cases 

. 17 

.876 

Table 

7 


. 16 

.000 

10 



.949 

25 



,880 

41 


. 16 

.700 

60 


. 16 

.684 

59 

Average of r’s. 


.791 
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From the correlations under lY, A, B, C and D, it is scon that there 
is a rather high degree of relationship between tost scores and interest, 
and school marks and iiitcrcab but practically no correlation between 
test scores and school marks nor between tost scores and time since 
studying tho subjoota. 

V. CoMPATllSON OF SexBS AND COMPARISON OF HlOU SCUOOL 

Subjects as to Peiimanence op Learning 

(A) Sex Differences .—^The table which follows shows sox differences 
in intelligence, high school marks, test scoros and median interest 


ratings. 


Subject 

Average 

iatollh 

goncQ 

Avomgo 

morks 

Average scores 

Median 

interest 

Table 

Latin: 






M. 

160 

79 

68 

D 

1,0 

W. 

101 

84 

40 

D 

I.O 

Anoioat History: 






M. 

101 

83H 

103 

B 

10.18 

W. 

164 

80 

132 

C 

10.18 

U. S. History: 






M. 

171 

82 


B 

10, 27 

W. 

16S 

80 

40H (Vmi W.) 
63 (Barr) 

C 

19, 27 

QooniGtcy: 






M. 

170 

79 

74 

C 

35, 41 

W. 

100 

84 

44.7 

C 

36, 41 

Physics ■. 






M. 

168 

70 


C 

44, 52 

W. 

104 

84 

13.7 

C 


Chemistry! 





M. 

1G3 

82 

33 

C 


W. 

167 

80 

20 

1) 

■ 


The following facts arc brought out in tho above table: (1) In 
each group the men surpass tho women in intelligence as measured 
by our tests except the group in Physics whore tho women aro 6 points 
ahead; (2) the women excel the men in school marlcs in all subjects 
except Chemistry where the men are 2 points ahead; (3) tlie men excel 
the women in. all test scorgs except in ,Chemistry where the women are 
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4 points ahead; (4) the two sexes have the same median interest rating 
in Latin, Geometry and Physics while in the remaining three subjects 
the men are one step higher. Itiainteresting tonotethatinthesubject 
in which the women excel the men in intelligence, namely Physics, 
they also excel in the test. 

{B) Co'fnparison of High School Subjects as lo Permanence of Learning. 

The main facts as to differences among high school subjects with 
respect to permanence have already been brought out in connection 
with Table VI and Fig. 1 on page472. Thereitwasseenthatreteution 
seems to be greatest in United States History, Ancient History, and 
Geometry and least in Chemistry, Physics, and Latin, Whetlrer or 
not these six subjeets would hold the same relative positions as they 
do now if we could make proper allowances for incidental learning is a 
question which cannot be answered in this paper. 

VI. Summary anp Conclusions 

1. With college seniors, as measured by the tests in this study, 
retention of subject matter studied in high school but nob continued 
later is greatest in the case of American History, second in the case of 
Ancient History; and so on down with Geometry, Latin, Chemistry, 
and Physics in the order named. 

2. Thoi’e is a positive correlation of about .5 between intelligence 
as measured by our tests and test scores. 

3. There is no correlation between intelligence and school marks, 

4. There is a rough correspondence between intelligence and 
interest in school subjects. 

5. There is practically no correlation between test scores and school 
marks. 

6. There is a positive correspondence between test scores and 
interest in school subjects. 

7. There is no correlation between test scores and the time differ¬ 
ences in our study. 

8. There is a positive correlation between school marks and interest 
in school subjects. 

9. According to our tests men slightly excel women in intelligence 
and in test scores while the women excel the men in higli school marks. 




A STANDARDIZATION OP CERTAIN OPPOSITES 

FOR 

CHILDREN OP GRADE SCHOOL AGE 

GRACE ARTHUR 
TJnivereity of Minnesota 

In a series of group tests* given in the spring of 1918 to the pupils 
of a grade school in St. Paul, were two lists of words used as opposites 
tests. Each list was composed of 20 words which I had selected from 
the tables of King and Gold** These tables give the relative difficulty 
of a large number of stimulus words when used as opposites tests for 
educated adults. They are based upon the responses of one hundred 
subjects selected from the students and faculty of the University of 
Iowa. According to these tables, the average frequency of correct 
response for the first list was 09.1, and for the second list, 93.26. 
Hence, the first list was called the '^easy opposites” test, and the 
second, the “hard opposites,*’ and will be referred to as such 
throughout this article. The words of each list were of approximately 
equal difficulty, according to the tables from which they were taken, 
and the halves of each list balanced: that is, the first half'of the list 
showed the same average frequency of correct response, for adults, as 
the second half. 

When these two tests were given to grade school children and the 
results scored, the easy list proved to be easier as a whole than the hard 
list as a whole. But it also appeared that for children, certain words 
on the ‘ ‘ easy * ’ list were more difficult than some of the words on the so- 
called “hard” list. It seemed, too, that some words of equal difficulty 
for adults, as WAR, YES and ASLEEP, were of widely vai'ying degrees 
of difficulty for younger subjects. Of couise, it was to be expected 
that the difficulty of opposites os shown by frequency tables would be 
quite different for children from whet it is for adults, Free associa¬ 
tion fj’equoncy tables compiled for- children* differ enormously from 
those for adults. Moreover, those of Woodrow and Lowell show that 
one of the most striking differences between the free associations of 
the two groups is that adults have a very much stronger tendency 
to reply to any stimulus word with its opposite than do children. 

Because of this known difference between the responses of children 
and those of adults in free association tests, and because of the apparent 
difference in quality of response in the opposites tests just described, 
it seemed best to make a special investigation with children in regard 
to opposites, before leaving the opposite tests of the group scale in 
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their final form. Such an investigation involves dotormining the ■ 
difficulty of specific stimulus -words for childvcn. This may be clone 
by presenting the stimulus words separately under uniform conditiona 
and securing responses from a large mimber of subjects. Tho difficulty 
of the stimulus word is shown by the number of subjects that arc 
unable to supply the correct opposite. The relative difficulty of tho 
various stimulus worda is determined by compariiig tho number of 
subjects failing each, the peroentago of subjects failing each, or the 
portions of the area of the normal curve included by those percentages. 

Prom the apparent lack of agreement botwccii the resvilts for adults 
and those for children, it was argued that a similar though loss marked 
difference might be expected between the responses from children of the 
upper grades and those from the lower. To adequately measure the 
•difficulty of a word for children, then, it must bo given to children of as 
many degrees of ability as possible. This might bo assumed to be 
•especially true of words that wmc very easy or very hard. The very 
'easy words would elicit a correct response from practically all of the 
upper grade children and m>, as with adulte, the results would bo well- 
nigh valueless for the purpose of rimging these words iu order of 
•difficulty. Hence, to find the relative difficulty of tho easier words, 
it would be necessary to get icspons^ from youuger subjects to whom 
they would offer a real test of ability. Conversely, tho very hard 
stimulus words would be failed by all tho younger pupils, and so these 
words would depend upon tho upper age groups to determine their 
relative difficulty. Involved iu this phase of tho probloin is the factor 
of the method of presentation and response. Means'* used oral presen¬ 
tation and oral response; King and Gold made use of visual presen¬ 
tation and oral response; while Greene® preferred visual presentation 
and written response. Por group teat purposes, tho last is porhapa tho 
most satisfactory of all methods.' It is evident, however, that for 
■children especially, the difficulty of a word with visual presentation and 
written response may be quite differont from the difficulty of the same 
word with oral presentation or oral response, or both. It is, in fact, 
possible that the difference l^tweon tho frequoncios of correct response 
for children and those for adults may bo partly accounted for by read¬ 
ing and spelling difficulties on tho part of tho youngor children. That 
it is not entirely accounted for in this way is indicated not only by the 
work of Woodrow and Lowell but by the differenco botwoon adult 
responses and those of childron from the upper grades where reading 
and spelling difficulties have been lai'gely overcome. As group test 
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conditions make visual presentation and written response the moat 
satisfactory method for the use of opposites tests in a group scale, it is 
necessary to use the same method in the evaluation of sthnulus words 
to be used later in such a scale. In this way, whatever reading and 
spelling factors exist may be taken into account and minimized. 

It is also evident that with a list of 20 stimuUiB words before him, 
a subject may take two or three times as long in finding the exact 
opposite of one word as he needs for the next. To class the words as 
equal in difficulty is manifestly inaccurate, even though, both even¬ 
tually elicit a correct response. Although the time element is discussed 
by King and Gold, and again by Greene, it is disregarded in both investi¬ 
gations, in the evaluation of the stimulus words used. In both tables 
the difficulty of the stimulus word is expressed in terms of per cent 
passing, or per cent failing to find the exact opposite, without regard 
for the time required to do it. In the work of Means, the time element 
was taken into account in the giving of the stimulus word and in the 
calculation of a point value. 

For the purpose of the present investigation it was necessary to 
select a number of stimulus-words to be standardized; to present them 
to a largo number of subjects of widely varying degrees of ability; 
to present the stimulus words visuallyand to require a written response; 
to present each stimulus word separately, and to allow a uniform 
amount of time for each response. 

Prom the tables of King and Gold, I selected ninety-six words 
having adult frequencies ranging from 100 down to 60. In order that 
the actual difficulty of each word apart from all others might bo deter¬ 
mined for the children tested, each stimulus word was printed in large 
type on a strip of cardboard eight inches long and two inches wide. 
These words were arranged in four piles of 24 cards each. A sheet of 
ruled theme paper was given to each child. He was instructed to fold 
it lengthwise, and to number the lines (24) on the face uppermost. 
The following formula was then used: “Bach of these cards has a word 
printed on it. When I say ^ready' you will look at the card and wnte 
as quickly as you can on your paper not this word, but a word that 
means JUST THE OPPOSITE. When I say ‘attention' you will stop 
writing instantly whether you are through or not, and hold up your 
pencil like this. If you can’t think of the opposite for any word, make 
a mark like this (X). If the word printed on the card were 'bad' what 
would you write?” After fcho children were shown what was meant 
by an opposite by means of this word, a preliminary drill was given 
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with tho words DOWN, GO and POOR, so that every child was made i 
to understand the mechanical part of the procedure- Tho first 
instructions were then supplemented by the following: “Now temom. 
her, you are going to write just ono word after each mimbor, and that 
word is to be the opposite of tho word printed on the curd. Ready." 
Bach card was exposed for 10 seconds, at tho eivd of which time 
“attention” was called. Thus, equal lime for each word wna secured. 
As each group of words wna finished, the child turned over his paper 
and numbered the lines on the face then mipcrmost in preparation 
for the next group. Before the kat group of words was presonbed 
the children were told tliat they were not expected to got them all 
as some of them were bard oven for grown people, but that I wanted 
to see how many they could get. 

The 96 stimulus words were presented to all tho cliildron above the 
first grade of a city school.® Thus responses were secured from 600 
subjects from grades II to VIII mcluaivc. Twenty-five of the grade 
II subjects were children of more than average ability who had been 
in school less than a year. 

The mechanical part of the work was carried out by tho children 
with a success that was highly gratifying. Even the little grade 11 
subjeots responded promptly to the signals. Tho papers of four who 
evinced too much interest in the work of thoir neighbors had to be 
destroyed, but no other symptoms of copying wore noticed cither by 
the teachers or myself. 

\ When, it came to scoring responses, a new problem presented itself. 
NeiihoT the standards of King and Gold nor those of Greene could be 
used without modification, owing to the much greater variety of 
response on the part of tho children and tho faulty spelling of the 
younger subjeote in particular. The amount of latitude to bo allowed 
in the matter of spelling had, manifestly, to bo decided arbitrarily. 
.To make the results represent something more than ono person's 
opinion, 10 judges were secured j five psychologists and five teachers. 
Each judge was given a list of the stimulus words with all the responses 
received for each, and wag asked to chock each rosponao that i)i his or 
her opinion should be given full credit, and to mark in a cUfforcnt 
way each response deserving half-credit. The judges ^voro reminded 
that this was not a spoiling test, but a tost in tlic ability to summon 
correct opposites. They were asked, therefore, to disregard spoiling 
as such, but to grade very strictly with regard to tho exactness of the 
association. Trom the 10 scored lists thus secured, the final standards 
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were determined upon the following basis: in order to receive full 
credit, a response had to have the full credit vote of eight judges, or the 
equivalent (os seven full credit votes and two-half credit votes, etc.). 
To receive even half-credit, a response needed ten half-credit votes or 
their nuruerical equivalent, and these votes must represent at least six 
of the judges: i.fl., two whole votes and six half votes, three whole 
votes and four half votes, four whole and two half, five whole and one 
half, or six whole votes. In twenty-three cases it was found that the 
scoring of a response was not consistent with the rest of the scoring. 
This is not surprising out of a total of more than 3500 words to be 
scored, These inconsistencies were remedied by applying the rules 
which seemed to govern the teat of the scoring. Verb forms were given 
full credit as well as noun or adjective, if the stimulus word could be 
interpreted as having that form. Vice versa, the noun given as an 
opposite was as acceptable as a verb or an adjective if the stimulus 
word could be regarded as a noun. A change in the part of speech 
from that of the stimulus word in form or tense received half credit. 
In one case, the inconsistency lay in accepting a given spelling of a 
response in one case, and in rejecting the same spelling of the same 
word later when it was offered as the opposite of another word demand¬ 
ing the same response. 

In Table I are given the 96 stimulus words used in the present 
investigation ranged in their order of difficulty for children, from the 
easiest to the hardest, with tho percentage of frequency of correct 
response for each, and the point value derived from this percentage. 
In a parallel column are given the same words arranged in order of 
difficulty foi‘ adults, with their rwpectiye adult frequencies of correct 
response. The point values, in terms of tr, are secured by passing from 
the percentage of correct response to the percentage of failure for each 
stimulus word and reading directly from table that is based upon the 
area of the normal curve and assumes the base line to be broken off 
+ 3.00 cJ The abilities involved in this investigation are assumed 
to fit this curve and the percentage of subjects who fail to give the 
correct opposite to a given stimulus word corresponds to the percent¬ 
age of the area under the curve from the 0 point to a point on the base 
line, measured in units of o-. The stimulus words are numbered in 
order that the differences in order of difficulty of the words for the two 
groups can be gi'asped readily. YES ranks as the easiest word for 
both groups, and CONSERVATIVE as the hardest. The greatest 
amount of displacement shown by any word is 80 for BIG, which has 
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a rank of 8 for children and 88 for adults. This cliffcrenco in rank is 
no doubt partly due to the methods of scoHiir. The woi’d showing the 
next greatest difference in position for the two groups is COUNTKY 
with a difference in rank of 64. Tliia difference cannot ho so readily 
explained. Neither can the fact that IN, wlneh ranks fonyih for 
children, falls to the fifty-fourth place for adults. HOT, ranking 
seventh for children, holde for adults the fifty-fifth place. The total 
displacement for the 96 words is 1493, and the avei'age displacement 
per word is 15.5. The correlation between the two lists, determined 
by the Pearson method adapted to rank difTcrences, was found to be 
.74, PE = .03. 

Further inspection of Table I substantiates the impression gained 
in scoring the responses to the lists of the original group scale. Of the 
words of the original 'Miard” list, half are easier than WAll on the 
"easy^’ list. Of the "easy” list, only two words have as high a 
frequency of correct response for children ns OUTSIDE on tho "Iwrcl ” 
list. For a<lults, QUICK, SOMETHING and LOV15 ni'o the three 
easiest words on tho " hard " list, with a frequency of 05 ]m cent. For 
children, QUICK ranks fourlli in difficulty of the words of this list, 
SOMETHING ranks eighth, and LOVE rnnk.s olevonih. 

Greene’s list was selected from the Hny<l Stimulus Words of King 
and Gold, Thirty-two words of hia list appear also in tho present 
investigation. A comparison of tho ranking of tliose words according 
to his results with their ranking for children yields a total displacc- 
;lhent of 173, an average clisploccmcDt of 5.4 per word, and a corrcln- 
tiOri of .76, PE = .05. 

•' ' Arguing from these comparisons of tho responses of the 100 educated 
adults of King and Gold and those of Greono’s larger group of college 
freshmen with the responses of 000 children of a given grade school, 
it would seem that the ability to give opposites differs in children from 
that of adults not only quantitatively but qualitatively; that the 
children arc no more "adults in miniature” in regard to this ability 
than they arc in other respects; and that the ca.so or difliculty of a 
test of this ability for adults gives only a very rough index of its case 
or difficulty for ohiklron. What factors constitute this onso or diffi¬ 
culty, it U not the purpose of this inveetigation to determino. It 
would be quite possible to cHminato tho factor of group reaction by 
presenting the same stimulus words to the same subjects in tho same 
manner, individually, and to eliminate reading and spelling by using 
a different method of presentation and response. It might be worth 
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while, also, to construct a table showing the school grade in which 
the various stimulus words reach adult frequency. Por instance, 
SUMMER, with an adult frequency of 100 per cent reaches the same 
frequency of correct response among grade Y subjects, while CON¬ 
SERVATIVE, with an adult frequency of 60, has for even grade VIII 
subjects a frequency of only 1.04 per cent. 

As stated at tho outset, the reason for this investigation was the 
need for definite information as to the relative difficulty of certain 
stimulus words when used as opposites tests for children, in order that 
lists of words of known difficulty might be constructed for use in a 
group scale. It is evident that Table I furnishes the desired data, and 
that from it such lists can be prepared of hard or easy stimulus words, 
and of words of equal or graded difficulty, according to the purpose of 
the test. 

It is interesting to compare the lists of opposites used by other 
investigators with the results shown by this table. Whipple® pub¬ 
lishes three lists of “easy opposites^’ prepared by Woodworth and 
Wells. “Lists I and II ... are presumed to be of equal difficulty 
and to be so arranged that the last half is just as difficult as the first 
half.” List III, which is the list of easy opposites recommended by 
Whipple, “is a selection of the 20 easiest opposites in Lists I and 11.” 
Fourteen words of List I, thirteen of List II and fourteen of List III 
were used as stimulus words in the present investigation. Those 
from List I with their respective frequencies for children are: soft 
SS.OS, white S0.B8, up 88.10, dead 76.66, hot 87.75, asleep 80.33, wet 
86.50, liigh 86.06, dirty 83.06, east 85.50, day 82.83, yes 92.08, wrong 
82.82, empty 81.00. 

The thirteen, from List II are; north 81.83, sour 71.08, weak 66.00, 
good S4.00, after G3.91, above 76.25, sick 74.25, rich 91,76, love 61.50, 
tall 62.75, open 74.58, new 85.16, come 86,58. 

Those from List III are; high 86.66, white 80.58, yes 92.08, above 
76.26, north 81.83, wet 86.50, good 84.00, rich 91.76, up 88.16, liofc 
87.75, east 86.50 day 82.83, big 87.76, love 61.50. 

As the lists arc not complete, and as there is no way of making 
allowanco for possible differences in methods of procedui'e and scoring 
between this and other investigations, comparisons are of necessity 
only suggestive and cannot be made the basis of any definite conclu¬ 
sions. But it would seem that when used as group tests for children, 
Lists I and II are far from being of equal difficulty, and List III 
includes some fairly difficult opposites and omits many of the easiest. 
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For the words here given, List I shows a range in frequency of correct 
response for children of 15.42 per cent, from 92.08 to 76.66; List II,' 
according to Whipple, “of presumably equal diflicuUy,” ranges frow 
91.76 to 62.75, which is a range of 39.01 per cent. List III, made up of 
the “easiest opposites from Lists I and II*’ has a range of 30.58, or 
from 02,08 to 61.50 per cent, 

As a substitute for List III, I would submit the following list of 20 
stimulus words that according to Table I are among the easiest for 
children: rich 91.76, outside 90.83, in 89.16, young 88.75, up 88.16, 
hot 87.76, big 87.66, high 86.66, come 86.68, wet 86.50, winter 85.75, 
east 86.60, new 85.16, soft 86.08, brother 85.00, good 84.00, dirty 
83.60, easy 83.16, day 82.83, wrong 82.83. 

This list has the advantage of being made up of words that vary but 
slightly from each other in difficulty. The range in frequency of 
correct response is only 8.93 per cent as compared with the range of at 
least 30.68 of “List III.” The words are presented in their Order of 
difficulty, but can easily be arranged to give a list in which the halves 
and quarters are balanced. 

If, instead of the above list of easy words of equal difficulty, parallel 
lists of easy words of gradually increasing difficulty arc needed, the 
following can be used: 


young.^. 

.. 88.76 

in... 



.. 87.76 


. 88.16 

high .. 

. 86.66 

big... 

. 87.66 

com©. 


wet. 


east. 


winter,.. 

. 86.76 

new. 

. 86.16 

soft. 

. 85.08 

good.. 

. 84.OQ 

brother. 


easy.;. 

. 88.16 

dirty. 

. 83.66 

wrong. 


dny.... 


empty. 

. 81.00 

north... 

. 81.83 

quick. 


white... 

. 80.68 

neat. 

. 7ft.RR 


80.33 

dead. 


heavy... 

. 77.16 

open. 



. 76.26 

late. 

. 73.76 


. 74.25 

something. 


first... 

. 73.26 

sour. 

. 71.08 

sharp... 

. 72.00 

Country. 


stay..,.. 

. 69.60 

here. 



68.08 

weak. 


sell. 

. 67.41 


The range for these 40 stimulus words is only 23,16 per cent and 
this again is less than that of Whipple’s easiest 20. Balanced lists, 
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as contrasted with graded lists, made up from them, would probably 
be of more nearly equal difficulty than those in use for children up to 
the present time. 

For use in the group scale, the lists of Table II were arranged. They 
begin with the very easiest words of Table I and grade upward by 
approximately equal and parallel steps to the hardest, This arrange¬ 
ment tends to eliminate the erratic results that sometimes follow 
when children are discouraged by the difficulty of hard stimulus words 
that appear near the top of a list, and so never reach the easier words 
farther down with which they could cope successfully. By the use of 
these lists, equal time can be allowed for all grades from the second to 
the twelfth, inclusive. The score can be determined by simply 
counting the number of correct responses, or by adding their point 
values. The use of parallel lists tends to increase the i-eliability of 
the final score. These lists should afford an adequate measure of the 
ability of a child to give opposites, as they are based upon the per¬ 
formance of other children. 

That these words had been adequately standardized for children 
was demonstrated two years later, when, in the spring of 1921, the 
lists of Table II were incorporated in the. group intelligence scale above 
referred to and given to all the pupils of Chisholm above grade I. 
An interval of one month was allowed to elapse between the giving 
of the two series. The following formula was used: '‘This paper has 
some words printed on it. I want to see how quickly you can write a 
word after each of these words that will mean just the opposite. If 
the word on the paper wwe 'cold’ you would write . . .? If the 
word on the paper were ‘little’ you would write , . ,7 and so on. 
(BAD and WHITE were also used as additional illustrations where 
needed). When I say ‘ready,’ take your pencil in the hand you write 
with, and take hold of the comer of the paper with your other hand. 
When I say ‘turn,’ turn your paper over and begin to work as fast 
as you can. Keep on working until I say 'stop.’ But the instant I 
say ‘stop,’.put your pencil down. Ready. Turn.” 

Equal time, 90 seconds, was allowed all gi'oups. To insure uni¬ 
formity of conditions, I gave all tests in person. Norms were 
determined for age groups 6 to 18, inclusive, for both lists. The 17 
and 18-year age norms might be modified by the results from addi¬ 
tional cases, as the latter especially appears to be a somewhat selected 
group. Table III presents these norms together with the number of 
cases for each age group and the standard deviation. In scoring, the 



492 


The Journal of Bducalional PsyekoloQ]} 

Table 1».—Sti^iultjs Words im Ordbii o® DiFfiCCL-tY von Children, with 
P ancENTAOES OP Correct IIesponsb for Each, aud Point Valdb in Tbums 
OP <?; AND TUB Same Worde AnRANaBD in Order op Dippwdlty for 
Adults, Accordinq to tiib Tadlbs op Kino and Gold 


Slliniilua 1 

Cliildren'a 

Fnhceiihoit 

Folnl vnlut 

StiumluB 

Adult’s 

Falironhoit 


82.0S 

1.50 


100.0 


1.02 


100.0 



1.07 


loa.o 


80.1C 

1.77 


100.0 



1.70 


100.0 


88.10 

1.82 


loo.o 



1.84 


100.0 


S? do 

1.84 


100.0 



1.80 


100.0 


80.88 

1.80 


00.0 



1.00 


OD.O 


SB 7B 

1.03 


00.0 



1.04 


00,0 



1.00 


. 90.0 

IR HnU,. 


1 1.00 


90.0 


SO.QQ 

' I.QA 


00.0 





00.0 

IS dirty.,.. 




00.0 


89.18 

2.0>I 


00.0 


82.83 

2.05 


' 06.0 



3.06 


' 08.0 



2.00 


08.0 


1 81.00 

8.10 


08.0 


80.68 

8.14 ' 


08.0 


80.68 

2.14 


98.0 


80.33 

2. IS 


03.0 


70.68 

2.17 


08.0 


77.10 

2.20 


08.0 


70.08 

2.27 


08.0 

30 above. 

70.26 

2.80 


07.0 

31 open. 

74.68 

2.34 


07.0 


74.26 

2.36 


07,0 


73.76 

2.30 



34 flrat.;,, 

73.26 

8.38 

weak.. 

00.0 

■9B Bomotlilng. 

72.83 

2.30 


06.0 


72.00 

2,42 



37 »our.. . , 

71,&S 

2.44 




60.50 

2 49 








40 land. 

08.08 

2.68 



41 hero.... ,,,, , 

08.08 

2.63 

Korc. 

04.6 

42 Boll. 

07.41 

2.66 



43 wnnk. 

00.00 

2.60 



44 wild... . .. 

05.01 

2.60 



46 remomber. 

06,00 

2.01 



40 nttnr.. .. 

03.01 

3.04 


04,0 
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Table I.—(Canfin.«cd) 


Sllmulue 



47 alrnlB^l. 

48 .. J2.6Q 

60 less. 

61 .. 

62 .. B8.33 

63 win. 

64 bcavitilul. 65.41 

65 best. ^3,75 

60 buy . 

57 .. 

68 bncbwnrclB. 62.33 

60 left. 

00 over... ^8.10 

01 .. 

02 beginning. 

03 forgot. 

04 . ^2.00 

.. •”-'® 

00 push. 41.00 

07 .. 

08 .. 40.16 

00 .. 30.10 

70 clioap... 

71 ninny. 30 -®O 

72 .. 35.00 

73 .. 34."o 

74 flbftt. 

76 lake . 

70 Innocent. 32,00 

77 tender. 20.33 

78 Btrongth. 27.25 

70 fftlie. 23,33 

80 .. 22.76 

81 rude. ^2-2® 

82 like. 

83 succeed. 

84 vortical. ^®‘2® 

88 blesB. 7.W 

80 Billy. 

ftO frociucnlly. 

01 refined. 

92 .. 

03 permit. 2.11 

04 .. 

05 . ' I 

00 conBctvftlivo. 




many.. 

auccecd. 

In... 

het. 

Bbongtb. 

Wl...,,. 

tcH. 

beginning. 

land. 

stay... 

id>Bont. 

Iraa...... 

newbere. 


gonorouB., 

blesB. 

like.,. 

pait. 

almple.,.. 

dok. 

lack. 

to take, 
rcfincc]... 
false,.... 
alnle....- 


rmec. 

lo float. 

yertioal. 

Billy. 

first.. 

beet.. 

push. 

tender. 

post. 

Injurious. 

Innoocnti.... 

big... 

permit. 

rude. 

mueh.... 

• Ireqncntly... 

oonntry. 

stormy. 

striot. 

I eonsorvativo 









































































































494 


The Journal of Bducalional PsucholoQU 

point value of a word was disregarded and the score was simply the 
sum of the full credit and half-credib responses. I'^rom these results 
it would appear that the lists were, for the children of Chisholm, of 
practically equal difficulty. Below the 17-year age group, the largest 
differeaco between the norms for the two lists at any ago level is .03, 
At the 17-year age level, the difference is 1.11, but this may be due to 
the limited number of cases at this point, aa the dilToronco between the 
norms for the two lists at the 18-yenr level again drops to .60. 

The standardization of a large number of additional stimulus words 
for childreti by the same method aa that described above, would make 
possible the arrangement of now testa ns alternatives to those here 
presented, and of others of various degrees of difficulty os the need 
for them in the practical use of opposites tests becomes evident. 


II.—Pahallei. Ltsta op GnAOBa Difpicui.t?, Based UroN Ciutunss's 
Frequbncy Tasi-bs 


Per coat 
paaalcg 



new. 

north. 

heavy.. 

sharp. 72.00 

weak. 06.00 

love. 61.60 

beat... 53.76 

over.,,,.,.,. 48.16 

pUBh. 41.66 

much. 36.00 

tender. 29.33 

ialBB. 23.33 

rude. 19.26 

stormy. 14.08 

bless. 7.01 

frequently. 4,68 

injurious. 3.16 

strict. .83 
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FURTHER NOTE UPON THE PROBABLE ERROR 
OF THE MEAN, 

TRUMAN L. KELLEY 
Stanford University 

In Dr. Holzinger’a roply* to my note® upon his formula (G)® for the 
probable error of the mean, occur the following statements: 

(a) “1 therefore .suggest that formula (1)^ be held in abeyance 
until further evidence is at hand.” 

(b) “I do not recommend the use of formula (3)^ as a substitute 
because, even though more reasonable than (1) assumptions as to the 
correlation of errors are still not clearly justified.” 

(c) (After a brief development)” . . formula (6) follows at 
once, since = tr*. Thus Professor Kelley's reasoning may lead 
him to the formula to which he is objecting.” 

As I cannot at all agree to the equation 

+ (Ta 

and fiu'tlier cannot follow Dr. Holzingcr’s reasoning where he states 
“formula (6) follows nt once,” I see no ehanco that my reasoning 
would lead me to formula (6). However, since Dr. Holsinger has 
withdrawn this formula there is no point in further defending my 
objection to it. 

Dr. Holzingor inti-oducos in ))is reply to niy note the following 
formula (2): 

and raises a question as to the aiithorehip of it which it is incumbent 
upon me to note. When I first derived formula (2)®I wasiinawarethat 
Spoannan had reached the same result. Spearman’s formula for the 
standard deviations of sums and the formula for the correlation 
between sums and differences’' have many very important applications 

* Journal Educulmiol PHfjclwlogy, September, W23. 

* Journiil EUuaUionnl Psychology, September, 1923. 

® Journal Eduendomd Psychology, Mfty, 1923. 

=<r,b(2-r,5). 

— tJlCig. , 

^Journal oj Educational Psydiology, November, l 919 _(not ./ourjiat Jia«catioHai 
Ecsearch, May, 1921). 

’ British Journal oJ Psychology, vol. 5,1913. 
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and I am glad to havo my attention called to the fact that the impor¬ 
tant fonniiltt (2) is one of the appUcntioiia wliioli Speafmati noted. 
Let me at this time further express iny regret that when I derived 
formulas for the standard deviations of sums and for the corrolationa 
between suma^ I was unfamiliar with Spcarmaii'a work’ and therefore 
did not credit him with the formulas which I gave. 

Dr. Holzingcr’s statement (6) quoted abovo lias no bearing which 
I can ascertain upon tho question, of the probable error of the mean, 
but I am glad formula (2) wag raontionecl as it gives me the opportunity 
for expressing my high regard for Spearman's contributions and my 
regret that upon at least two occaaions I have not duly credited him, 

* Brilish Journal of P8ychology^ Vol. 6, 1923. 

* Tables, etc. Btilieiinof the University ol Texas, No. 27, May, 1916. 



NOTES ON ARTICLES IN EDUCATIONAL 
PSYCHOLOGY IN CURRENT ISSUES OF 
OTHER MAGAZINES 


REPOllTED BY CEGILE COLLOTON 
Department of Educational Psychology, The Lincoln School of Toaohera College 


Intblliqbncd Tbbtb 

The Uae^ of Intdligence Testa in Educational Adminislralion in ihs Providence 
Schoola. Richard D. Alton. School and Society, 1623, Sept. 22, 336-339. De¬ 
scribes tho ohangoa in olaaaifioation and promotion in a school syatem where the 
intclIigoncQ toat is wisely used. 

Teala of Candidalea for the Rhodes Sekdarski-p. J. B. Miner. School and 
Society, 1023, Sept. 8, 297-300. Reports a highly aatisfaotory use of intoIUgenoe 
testa aa aupplomcatary ovidon.ee to personal hlstorioa and interviews in choosing a 
Rhodea Soholar. 

A Study of the Data on the Results OrUhered from Repeated Mental Examinations 
of 200 Defective Children Attending Special Schoola over a Period of Eight Years. 
Mote L. Anderson. Journal of Applied Psychology, 1923, March, 64-64. DIs- 
eussoa six types of mental growth curves of feeble minded ohildren and summarizes 
variations in IQ. 

7’ke Freshman.' Thorndike College Entrance Tests, First Semester Grades, Bind 
Tests. W. I. Root. Journal of Applied Psychology, 1923, Maroh, 77-92. Re¬ 
ports a study of 600 Freshmen of tho College of the University of Pittsburgh with 
special omphasis on tho prognostlo value of the Thorndike test. 

A Menial Survey of 69 Ifornal School Students: Some Correlations and Criti¬ 
cisms, Lnwrenco A. Avorill. Journal of Educational Research, 1923, April, 
331-337. Otis Group Examination Scalo, Tlmratone Psychological Examination, 
analysis of scholostio grades, and iostruotors’ estimates of native ability supply 
the data reported in this study. 

A Study of the Bind and Terman Intdligenee Teala with Eleven-year-old Children. 
George T. Avery. Journal of Educational Research, 1923, May, 429-433. Re¬ 
ports tlio Terman Group of mental ability ns a reliable instrument for olassification. 
Sliows correlation of .80 with tho StanCord-Dinot. 

Meaauremenl of the Effeclivenesa of DifferenHaiion of High School Pupils on the 
Basis of the Army Intelligence Testa. CliffoiSi Woody. Journal of Educational 
RcBoarch, 1023, May, 307-409. From the results of a study of the scores of 83 
high school frcsJimen on tho ormy alpha InteJligenc© Test and four educational 
tests tho author states that any good intolligono® tost oan be used satisfactorily for 
grouping students of homogeneous abilities. 

Some Uses of the Freshman Teat in the Smdler College. School and ^ciety, 
April 14, 416-417. The Thurstons InteUigenco Teat used with 460 subjeots at 
Sweet Briar College over a period of three college grades does have predictive 
value and gives information of value to the administrator and the teacher. 
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Grading and Promolion. Josoph S. Tn-yloi. Hchot>l >uii\ .SoMoly, 1923, April 
14, 406-409. Tho limitations of inleJligcncc losla mul llmir value in llio Btudy of 
individual capacities. 

The Stanjord-Binel Teals in Some Schools. P. L. (Ivay anil 11. E.- 

Marsclon. Joumn! of Educational llcKonrcIi, IU23, Scptemlicr, ISO-inr), lloporU 
tho satiafactovy uao of tl\c vStanford-Hinet iii EukUkIi schools. SugKCHts a low 
changes in wording and in location of separate IobIh. 

Malicn of General ItilelUgence to Ihe PcratHltwc nj Eiluciilioiml awl Vocational 
Plans t^f Ififfh Scfvool Pupfla. WUUanv Martin Pwetue aiul UvUiu Ward, 

April, 277-288. llopoi ta the second chcck-up on 771 liigU school impils whoso 
vocational ambitions, educational plans, school succoas, mid Rciionil intclligonco 
scores had been secured in 1917-1018. 

The Reliahilily of MA and IQ Bascfl on Group Tala of ftVnf.i'nl Menial Ability. 
Arthur I. Gates. Journal of Applied Psychology, 1023, Slnrcli, 03-100. Shows 
the great variation in M A and I(i computed from differcut group tesU and warns 
ngainstauch ftiiscof group test ecorea. 

Democracii, Delcrminiam and Ihe IQ. John C. Almnck, Jiiiruw F, Eiirach, and 
Jaroea C. DoVosa. School and Society, 1023, Sept. 8,202-205. Answera some of 
the arguments of tho opponents of Intclligonce lealing. 

Again: Educational Dclmniniam. 'JVumaii L. Kelley. Journal of Ediicationnl 
Rcaeaioh, 1923, Juno, 10-19. An analytical discuiwion of the imdn iroints of dis- 
agroomoiit between Doglcy and Tcrinan. 

Tranamulation of Scoros betioeen Qinel Tenia owd Group Teala. T, Hoot. 
Journal of Educational Ucaoarch, 1923, April, 338-341. Vive tables of ivmistnutcd 
scores of value to tho administrator for rough coinpiinson. 

Eouc&TtOMAi, Tcata 

The Bngga Form Teal in Uae. C. 0. Certain, 'riie Kngli.sli .louinHl, 1923, 
April, 244-267. Shows how publicity given to groxip and individual test scores 
aids in dcfiiiito improvement. Explains procedure ami foniiK roiiuirud for tho teat. 

Are Your Pupils up to Slatid&rd in CoinpaaiUon? C. G. Covtuin. 3’lui Englisli 
Journal, 1923, Juno, Fresonts typical procedures for Ibo use of standard testa in 
composition in tho high school and explains how to interpret iraults. 

Why Not Include Standard Tests iti Your Teaching Program This Vearf C. C. 
Certain, Tho English Journal, 1023, Soptemher, 403-480. Oullinea a doOaito 
schedule of standard tests for use in English dosses and gives dotniled directions for 
scoring papers, and tabulating nnd interpretingI'caulte. 

A Survey of the Test jl/euowent tn IHslory, Paul Tyler Kopner. Journal of 
Educational Research, 1923, April, 300-325. Presents "(1) the iiietliodolngy of 
constructing tests; (2) the types of tests in the field and their miUiro; (3) the 
criticism, destructive and constructive, passed U])on t\iO variuua tents,'' 

A Study of the Validity of the Courtis and Sludebaher Pradicc Tests in thcPwida'- 
menials of Arithmetic. W. J. Osburn. Journal of Educational llcsoarcli, 1923, 
September, 03-106, Show's that caidi teat covers less than txYO-thirds of Uio total 
field and affords too little practice on particular diniDultica such fls zero combina¬ 
tions, division with remainders, otc. 

The Use of American Feste to ATeasure AInpHsA Tcoc/iinp in Chino. Sterling G. 
Brinkley. Journal of Educattonnnl Research, 1023, Soptombor, 136-144. Sum- 
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marizcs tho results of English, reading, language and spelling scales used in a 
typical college preparatory middle school of East China. Raises questions for the 
modern langiingo Lonelier. 

rears in School and Achieocmenls in Reading and Ariihmelic. Clias. L. Harlnn 
Journal of Educational Hoaenreh, 1923, September, 145-149. Results from the 
Moni-oc Silent Rending Tests nnd the Monroe Reasoning Tests in Aritlimetio show 
tlmt it is not tlie ajiionni of time spent, but how and by whom the time is spent that 
counts for nohiovoinont, 

Science 0 / Education. William A. McCall. Journal of Educational Research, 
1023, May, 381-390. Describes tho construction of an arithmetic test for use in 
China. 

Miscellangoob ‘ 

Training for leadership. Garry C. Meyers. School and Society, 1923, April 
21, 437-130. Argues for tlio development of sense of social obligation and traits 
of sympatliy and likonblcness in gifted children. 

The Man of Genius. S. C. Kolis. School and Society, 1923, April 21, 429-431. 
Pleads for the inornl education of the child or superior intelligence. 

From Him Thai Hath Braini, Louis A. Pechstein. Educational Review. 
1923, June, 1-0. Uho.s ilala from both grade children and college students to show 
how the I(i, and AQ operate in measuring tlie educational efficiency of the 
individual. 

A Study of Ten Gifted Children IVhose Schod Progress Was Unsalisfaciory. 
Dorothy Van Ahtyne. Journal of Educational Research, 1923, September, 
1923,122-13r). Reports tlio variouB means used to secure individual analysis, and 
the remedial measures used to bring each child up to the level of his ability. Two 
typical oases aro given in detail. 

So?«c C/iarflc<cria<tcs of djeaderehip, L. Ruth Nutting. School and Society, 
1023, Sept. 29, 387-300. A study of the reasons given for choice of captains in 
sovcatli and oiglitli grado girls, gymnasium classes shows good judgmont as to the 
qualities of importance to a leader. Qualities are listed. Comparisons with 
intelligence and popularity ratings arc made. 

AIcntal AUilude of Children towards School Work. Arthur W. Kallom, School 
and Society, 1923, Sept. 22, 339-343. Emphasises the necessity for following up 
tost roaulta by intciiaive study of individuals who are exceptional in any way. 

The Slndtj of Individual AMlUies—The Normal Child. Walter F. Dearborn, 
Sciiool and .Society, 1923, Sept. 22, 331-336. Repeated tests—both monta! and 
physical—of the same children over a period of 10 to 12 years will answer many 
qucalioiis about individual development. Such an experiment Ims been begun 
-ftt Jlarvnrd, 

McnHurhig IiUcrcsl Objectively. Harold E. Burtt. School and Society. 
1023, April 21, l-l l-ldB. Describes four tests designed to measure interest and 
iiuluslry used with 43 .studonfs spcoinlwlng in agricultural ongineoring at Ohio 
State University, R(>.sultg arc interesting thoughnot conclusive. 

The Downey Will Temperament Group Teal: Analysis of Its Reliahihly and 
Validity. G. M. lluch and M. C. Del Manxo. Journal of Applied Psychology, 
1923, 05-78. Sovon tnblc .9 report tho statistical analysis of the scores of 140 
higli school students tested with the Downey Group Will-Temperament lest. 
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Six detoilBd conokeiona BtimniMuo tho most importcmt fijidinRS and omphaaizo 
the necdfor a jnoro valid inatrumcntfor menaariug volitional trnila. 

JSocifll Difference as Afcaeurwi by Ike Domiey Wdl-TcmperauictU Tetl. Joho P. 
MoFaddon and J. F. DashiolK Journal of Applioil Psychology, 1D23, March, 
30-63. Six tables, six figures, and a summary give tho results niid conclusions of a 
study of tho comparison of toroporamcnl in individunla of tho while and negro 
races. 

An Fxpcriwcnl in Ttaiiny EngineLaihe Apiiinde, Kverett F. Patten. Jouin^ 
of Applied Psychology, 1023, March, 10-20. Dcacriljcs llio analysis of engine 
lathe ability by obBorvation of moa at work and tlio solcctioii of tests to tneosute 
this ability. 

Herheri: A Study of Diffcrdty in SftUiTig and Reading. Bornioo LcUnd. 
Journal of Educational Reaeatoli, 1023, Juno, 40-68. A case study of special 
disabilities and report of remedial measures employed. 

New Tfiou(;/its about the Feeble-Minded, likigar A. Doll. Journal of Educa¬ 
tional Research, 1D23, June, 31-48. Emphnslscs the imporbanco of tho special 
class in the social and industrial training of tho fccblo minded. 

Another OnKcism oj Tests Hs^in'np AUemolive Rceponees. C. W. Odoll. 
Journal of Educational Resoaroh, 1023, April, 820-830. Advantages of True- 
false Tests emphoeiaed. 

A CriUcal Sl^^dy of the Rigkl Mima Wrong Method. Paul V. West, Journal of 
Educational Research, 1023, Juno, 1-0. Shows data to prove tho unroHability 
of the right minus wrong method of scoring. 

The Effect of Fir&t-year Latin upon Kmowkdge of Englieh Words of Latin Deriva¬ 
tion, E. L, Thorndike and G. J. Ruger. School and I5ooioty, 1923, Sojit. 1, 200- 
270. Results from the Carr English Vocabulary Tost used in ninth grade classes in 
schools representing 19 states show that Latin studonts mado muck greater than 
non-Latin students in Latin derived words. Twelve tables give slntistioal data. 

An Inditddual Record Card for Preterring Teel Data. Graco Arthur. Sohool 
and Society, 1023, Sept. 22, 36ft-367. Describes a single card used in Chisholm, 
Minnesota, for recording standard tests results. 



NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 
EDUCATION 


CONDUCTED BY LAUEA ZIRBES’ 


li A Sook oJi Special Abililies and DisdbilUies .—'For o, genoration 
data have been accumulating concerniDg special abilities and defects 
and their relation to general mental ability. Dr. Leta S. Hollingworth 
has done a helpful bit of work in CKdleoting and interpreting the evi¬ 
dence in her recent hook.* 

It is not a detailed case-book like X>r. Bronner’s "Special Ability 
and Disability,” or a report of new investigations like Dr.Holling- 
worth’s earlier monograph, "The Psychology of Special Disabilityin 
Spelling,” or of Dr, Gates's "Psychology! of Reading and Spelling.” 
It is au Epitome and evaluation of both theories and the data of 
investigations. 

For example, the author sets forth certain current positions on the 
relationship between “capacities” (throughout the book the term 
capacities is used to mean present alnliiiee, not merely native capacities) 
by contrasting the Spearman and the Thorndike points of view. By 
a review of the data of correlation she discusses the issue of general 
intelligence vs. specific abilities. The helpfulness of her book to the 
student of education and of educational psychology is well illustrated 
by the way she brings the literature of such problems up to date. The 
book is an inductive exhibit of the contemporary dynamic psychology. 
It uses throughout the data of mental and educational measurement 
and emphasizes the specificness of mental activity. 

Special talents and defects mean to Dr. Hollingworth, those 
abilities demanded of an individual for success in particular school 
subjects—reading, spelling, arithmetic, music, and drawing. Abilities 
involved in other school subjects sudh as language (composition and 
grammar) and history and related social subjects are not discussed. 
The author’s treatment of the neural basis of mental life may be 
summarized fairly by this sentence: “Experimental neurology has 


‘ All unsigned reviews were prepared by Laura Zirbes. 

* "Special Talents and Defects." New York, Macmillan, 1!>23, pp, xix 216. 
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nothing secure to offer by way of establishing the ncui-al basis of the 
special talents and defects.” 

The chapter on the psychology of rending shows the liigli positive 
correlation between reading ability and the IQ, gives a three-page 
resumd of the evidence on the mechanics of reading, devotes 19 pages 
to reporting a new four-year study of a non-reader, and presents an 
interpretation of studies of both non-rcfldera and of easc.s of special 
ability in reading. At this point in the book it soeins to me there is 
far too little attention given to the recent studicjs from the Jiidd- 
Buswell laboi'atory. 

An excellent chapter on the psychology of spelling consists of an 
analysis of the process of learning to spell and an epitome of the results 
of psychological examination of poor spcUci's. Tlio author takes up 
such questions as "Can special defect in spelling 1)0 ovcrcomo?” and 
“Does reading toach spoiling?” 

A very brief chapter on arithmetic summnriKos the data on such 
problems as the relation between IV and ability in arilhniotic, mental 
functions employed in arithmetical calculation, the organization of 
arithmetical abilities, psychological studies of <loficioncy in ni'ithinctic, 
and arithmetical prodigies. (The last named is given more space than 
is devoted to all the preceding matters.) Much of this chapter 
consists of abstracts and quotations from Tliorndikc nnd those asso¬ 
ciated with him in the investigation of arithmetic. 

The author treats music nnd drawing in much tlie same ^vay and 
completes her survey by a brief treatment of individunlity nnd educa¬ 
tion. Certain miscellaneous special talents are briefly discus.sod, e,g. 
mechanical ability as measured by tests nnd important social qualities. 
The inadequacy of data in these two fields, especially in the latter, is 
revealed by the discussion of ability to lead and hnncllc people. The 
data of two studies which I have made load to the tentative conclusion 
that there is little corrclntion between such abilities ami rcrhal intelli¬ 
gence. This is quite at variance with Dr. IToilingwortli^s point of view. 

Her book will be very helpful to teachers who wish a brief rcsim& 
and interpretation of the psychology of the school subjects. 

IIahou) Uuun. 


2. A Strategic Position in ISducalion .—” As is tho principal so is the 
school” is the fundamental theme of a recent textbook^ on. tho orgatuza- 

' Cubborley, E, P.; "The Principal and His School." New York, Houghton 
Mifflin Co., 1023, pp. 670. 
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tion, administration and supervision of instruction in an elementary 
school. For two ronsons the appearance of this book is particularly 
opportune. Tlic first is because of the substantiating evidence which 
it adds to tim ra[hdly growing conviction that the hitherto neglected 
and submerged elementary principal is in reality a very important 
person in deterniiiiing the educational tone and general effectiveness 
of our schools, 'rhe second is the wealth of professional training 
material which this hook makes available in a field where such material 
has been noticeably meagre. 

The author has so filled hw 570 pages with sound educational 
theory; pertinent principles of school administration; interesting, 
concrete and helpful illustrations; and suggestive and stimulating 
problems that the book will commend itself to a wider variety of 
readers than the title alone would indicate. No principal, experienced 
or.inexpei'ioncGcl can road it without profe^ional iinprovoment result¬ 
ing from every chapter. No teacher, regardless of how long he has 
taught, can ]'oacl it, particularly the chapters on supervision, without 
receiving innuuieralde helpful suggestions and an increased profes¬ 
sional zeal whicli is certain to result in more intelligent and sympathetic 
cooperation witlj the principal of liis building. N"© supei’intendent or 
supervisor can read the clear-cut analysis of the relationship of the 
principal to those ofliccm without a bcttei* understanding of this 
troublesome administrative problem. Chapter XVIII alone, on 
“Knowing tho School ” would justify the addition of this volume to the 
professional lilirary of any principal or teacher. 

Tho Icchniciuo of scientific experimentation in education is yearly 
becoming morn generally understood and appreciated, Tlie number 
of varying theories of education and proposed methods of teaching is 
yearly becoming more numerous and more confusing. The conviction 
is growing that many schools will have to apply tho test of scientific 
experimentation to those conflicting propmls in order to ovaluate 
them. If this is to be done, the building principal is in the most 
strategic position to become a director and guide for such experiments 
as intogj-iil pm'I,s in his largej’ program of supervision and teacher 
growth. This phase of the principal’s work seems to be the single 
omission of any importance in the comprehensive treatment presented 
in this very readable textbook. 

B. S. Evbnden. 

Teachers College, 

Columbia University. 
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3. A Sigmficant Committee Report. The Reorganization of Malke- 
matics in Secondary Education. —^This is the most valuable report' 
that has ever been made in the field of secondary mathematics in this 
country. It should prove to bo epoch-making in its effect, not only 
because of its intrinsio woxth, but also because it should stimulate 
many individual teachers of mathematics and organissations of such 
teachers to carry on and extend the investigations so u^oll begun. 

The members of this committee were highly trained specialists in 
their various fields, either in secondary or higher mathematics. They 
were so choaen^as to give all sections of the country fair representation 
in the report. 

The report carries weight especially because, contrary to the usual 
oustom, the work of this committee was subsidized. Thus, it was 
possible for all members to attend its sessions, and to have a hand in 
making the final report. 

Moreover, the findings of the committee have been influenced by 
suggestions and help from organizations of mathematics teachers 
from all parts of the country. The report therefore represents in a 
broad sense the best thinking on the reorganization of mathematics in 
the United States. 

The report is divided into two parts os Mows: Part I, General 
Principles and Recommendations. Part II, Investigations conducted 
for the Committee. 

Part I consists of eight chapters dealing with a brief outlino of the 
report, the aims of mathematical inatTUotion—general principles, 
mathematics for years seven, eight, and nine, mathematics for years 
ten, eleven, and twelve, college entrance requirements, a list of propo¬ 
sitions in plane and solid geometry, the function concept in secondary 
school matheinatics, and terms and symbols in elementary mathe¬ 
matics. It is this part of the report that is likely to prove most helpful 
to the classroom teacher of mathematics. 

Part II also consists of eight chapters dealing with the present 
status of disciplinary values in education, the theory of correlation 
applied to school grades, mathematical curricula in, foreign countries, 
experimental courses in mathematics, standardized tests in mathe¬ 
matics for secondary schools, the training of teachers of mathematics, 


* A Kepott by the Netional Committee on Mathemnlioal Requirements under 
the Auspioea of the Mathematical Association of America, Inc., pp. x -|- 662, 
(Obkainablo from J, W, Young, Hanover, N. H.) 
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certain questionnaire investigations, and a bibliography on the teaching 
of mathematics. 

While Part II of tlie report may not be so generally helpful to the 
classroom teacher, certain parts of it are sure to affect his teaching 
practice and his thinking. 

It may be regretted by some that it was not convenient to carry 
the investigations to the stage of ‘^scientific accuracy." To answer 
some of the questions raised in the report will take years of careful 
study and further investigation. It is doubtful whether it would 
have been possible to hold such a group together long enough to 
complete a work of this type. It would have involved too great 
personal sacrifice. 

The committee has indicated what the child should be taught, 
with some limits as to organization and methods. It still remains for 
some one to show' what the child is really able to learn with profit in a 
given grade. Such a plan is far reaching and must necessarily take 
into account the entire question of method aa well as content. 

This report has made a noteworthy oontribution to the field of 
secondary mathematics. The individual members of the committee 
deserve the thaulcs and congratulations of all teachers of mathematics. 
It is to be hoped that the material of this report may be made available 
to all teachers of mathematics in the secondary schools. 

W. D, Reeve. 


4. A Textbook for CoUege Courses.—This is a brief, concise treat¬ 
ment of the intelligence testing movement. The book^ is divided into 
three parts: Part I, historical and theoretical; Part 11, the methods; 
and Part III, the results. 

The book acquaints the readm* with the liistorical background of the 
intelligence test and the theoretical assumptions underlying the work. 
It makes him familiar with typical tests—^both individual and group 
their general make-up, the uses for which they are best adapted, and 
the sources from which they can be obtained. In the final section 
are brought together the most important investigations that have 
been made in various fields. Separate chapters are devoted to each 
type of individual studied, such as "The Blind, The Feebleminded, 
The Superior, etc." 

1 Pintner, R.; "Intelligence Testing.” New York, Henry Holt and Company, 
1923, pp. vii -t- 406, 
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A bibliogvaphy following each chapter in the Ijook fuunishea the 
sources of the original material and provides the instructor with a 
wealth of supplementary reference. ^ 

C. N. COLUOTON, 


5. ”A Three-foot Shelf of Tests .”—Thus does the author* of this 
handbook on educational measurements play on Dr. Eliot’s phrase 
and chai’actcrize his selection of tests. The book starts out with six 
brief chapters on elementwy statistics. Tlie measurement of mental 
ability is discussed in seven chapters and the remaining seventeen 
chapters are devoted to achievement tests in elementary and seconclavy 
schools. 


6. Discourses on the Psychological and Pedagogical Implicalions 
of Adolesceace.®—When title, preface and introduction all lead one to 
expect a practical cUscussiou of ‘'problems” it is indeed (Usconcoi’tmg 
to be obliged to wado through 356 pages of discourso before nncoimter- 
ing the first concrete incident or statement of mi actual problem and its 
solution in practice. 

It is equally disturbing to note the wide range of topics discussed 
in the book and tho inadequate treatmont oi Borae of the orMoial 
questions. The quotations from G. Stanley Hall, C. H. Johnston, 
G. H. Palmer and scores of other writers abound, but tboro is not a 
footnote in the book and the reader is wearied by the consequent 
frequency of such indefinite references as tlio following; “It is well to 
recall Dr. Hall’s summary concerning character ...” or “The 
Committee of Ten seems to assume. ..." Particular works or 
pages are not mentioned. The frequent use of data from investiga¬ 
tions with no mention of tho data is os disturbing to the critical reader 
as the utter lack of such data in the bibliography. For effective use 
with prospective high school teachers there is an erroneoii.s assumption 
of educational experience and background, with no urge to mental 
activity in the form of summaries, suggested problomsj illustrative 
incidents or case studies. 


* Hinea, H, C,: "A Guide to Educational McMurcincnts.” New York, 
Houghton MifBin Company, 1923, pp. 22 + 270. 

* Pringle, BnlphW.; "Adolescence and High School Problems.” Now York, 
D. C. Heath and Company, 1022, pp. k + 380. 
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7. A Suhstituieforlhe “Sescr»c^fAei/.”—Wide reading in connection 
with beginners courses in psychology has been facilitated by this com¬ 
pilation and topical organization of some 200 selections from the 
writings of almost as many outstanding contributors to psychological 
thouglit.* The compilers have simplified matters for instructors in 
psychology hy supplying thought-provoking problems for use as 
assignments or exorcises in connection with each quotation. If 
desired ilie book can be used in connection with a regular textbook. 
It includes widely differing points of view from which assignments can 
be made as the purpose, point of view or capacity of the student or 
instructor may dictate. The organization and content facilitate 
comparative consideration of various view points and thus stimulate 
to furilier inquiry reading and study. 


8. A New Book on the TeacMr^ of Reading .—A book on this subject 
seems superfluous in view of other recent treatments of the same topic. 
Q’his notion must be abandoned, however, because of tlie significance 
of the organization of this particular publication^ and the consistent 
embodiment of a psychological view point in the presentation. Part 
1 consists of three introductory chapters. The first is on the aims 
or objectives of reading instruction. The aims of five teachers, and 
those given by several authors and educators are cited. The purpose 
of the chapter is stated. Social and historical data upon which a 
revision of aim must depend arc next included, and specific aims are 
outlined. There follows a concise conclusion, then a summary of the 
chapter and a list of readings with specific chapter references. The 
first pai'agrapli in each successive chapter shows its connection with 
the preceding chapter, and then continues the discussion according to 
the scheme of the introductory chapter. Both sides of controversial 
discussions arc presented. Sub-titles am clear forecasts of the content 
of each section, and abundant concrete examples and instances are 
used to iiiti’oduce abstract notions or to exemplify principles. The 
author show.s keen psychological insight of his subject both in his 
analysis of pupil aetivities and in the deveiopinent of his subject. 

‘Robinson, T'Mward S. and Robinson, Florence Richardson: "Readings m 
General Psychology." Chicago, The University of Ghiengo Press, 1023, pp. xvi -|- 
G74. 

* Wheat, ir, G.: "The Teaching of Reading." New York, Ginn and Company, 

1923, pp. xi + 340. 
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He does not err by serious over- or undor-omphasia of topics. About 
100 pages are devoted to the reading process and it is in thia portiou 
of the book that moat of the quotations from scientific studies appear. 
Part III ia entitled "The Course of Study in Reading/’ A historical 
chapter is folio-wed by a detailed disouaaion and practical interpretation 
of aims and principles by grades. The book concludes with a chapter 
on remedial work. 


9. Impulsive Tendencies Satisfied bp Industnal Studies. —When one 
considers the divergent and conflicting tendencies of the past 2(1 
years, it is a notable fact in the development of industrial arts educa¬ 
tion that so few elementary school teachors still attempt to justify 
industrial studies and experiences on the basis of disciplinary values. 
These elementary studies of present day industries should share the 
responsibility with other school subjects for helping pupils to develop 
appreciative insight and reasoning power in terms of significant inter¬ 
ests and actual life needs. These studies aro fast replacing the 
so-called "busy work” or handwork emphasis in industrial coui'ses, 
Those who believe that children from six to twelve years of ago are 
conoerned mainly with situations and activities similar to those in 
which adults ate engaged, rather than sevicB of formal models or 
abstract construotions will welcome the now book^ by Professors 
Bbnser and Mossman. It is a contribution to the literature of indus¬ 
trial arts education. It contains a wealth of suggestive material and 
makes provision for concrete "outcomes” at the olose of each of the 
several units included for the six elementary grades. For some years 
the best authors have tried out under varying conditions the principles 
of selection and organization which they have proposed in Part I 
of thia volume. The results of their experience and observations at 
the Western Illinois State Normal School, at the Speyer School of 
Columbia University, and in connection with the work of a largo 
number of cooperating teachers in different public school systems are 
presented in Part 11. This deals with the application of these principles 
to the activities in which the children are expected to engage. 

Throughout the book "constructive work is presented as a means 
of awakening intellectual inquiries, of giving meaniuge and values, of 
cultivating appreciations, and of leading on to further interests on 

' Bonaer, F. G. and MoaBman,!., O.: '^Industrial Arts for Elementary Schools.’^ 
, New York, MaoEniUan Company, 1923, pp, xi 4 - 491 , 
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levels higher than those apparent at the beginning of the construc¬ 
tions." The introduction to a chapter dealing with the psychology 
of the induatrial arts calls attention to marhed instinctive tendencies. 
Such examples are given aa the impulse to manipulative activity, the 
impulse to investigate, the artistic or ajsthetio impulse, and the social 
impulse. In this broader interpretation of instinct, the authors have 
had the following in mind: "The impulses of children to investigate 
and to enjoy, or appreciate, have been utilized quite aa much as the 
impulses to manipulate and construct.” 

This book can be used advantageously as a handbook by progres¬ 
sive elementary school teachers aa well as by those students who are 
preparing to teach in the elementary grades. 

A. H. Edqerton. 


10. Motnes in the SchooU .—^To the superintendent, principal or 
teacher who wants to use motion pictures in the school but does not 
know exactly how to go about it; "Motion Pictures in Education”* 
will prove of real value. Excellent advice is furnished on technical 
details of projection; and sources of films are injicatcd. The book 
will effectively short-circuit the long process of trial and error through 
which many of us have had to pass in visual instruction. 

Modern educational theory would take exception to much of 
the philosophy of tho book. Education is defined in one place as "the 
imparting and acquisition of knowledge;’^ in another place as "the 
harmonious unfolding of all the faculties.” Memory is considered as 
a "faculty" capable of general development. Several of the Peda¬ 
gogical Principles on page 161 are open to serious criticism. Bub the 
suggestions for the use of films in the class-room are much better than 
the philosophy, and will be of great service to teachers who are strug¬ 
gling to evolve a methodology for the use of this newest of teaching 
devices. 

Edwin H. Reeder. 


11. Teaching Co?nposition in France .—^Phyllis Robbins' translation 
of Mr. Bozavd's intimate account ofwhathesaysand does when teaching 
"Class 2” boys (16 to 18 years of age) to write will particularly please 
those whose interest in French composition teaching was aroused 

‘Ellis, D. 0. and Thornborough, L.: “Motaon Pictures in Education.” Now 

"York, Crowell Company, 1023, pp. xvii + 284. 
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BEVCval years ago by Hollo Walter Brown’s Ilow Th French Boy 
Learns To WnU. In this new book,* however, one’s intcrcat is likely 
to center finally in Mr. Bessard, the leaolicr, rather more than in Mv. 
Be^arcl’e method. The method, if Mr. Bezard is left out, Bcema tradi¬ 
tional, academic, and what we in America are inclined to call uupsycho- 
logical. But the tenclier in tliis case lias tho breadth of view, the 
sensitiveness to boyish life, the facility of mind, and, above all, the 
good sense to make any method work. Most of us, nt any rate, 
would like lo write as well os do the boys whose themes or parts of 
themes appear in this volume. 

Maybe, too, the American teacher of coini)ositiou might profitably 
consider again, in the light of Mr. Bozard and liis success, tJio validity 
pf such ideas as these: The teacher of composition should bo a person 
of outstanding culture and of wide intcre.st3. He should be able to 
write M'cll himself and to demonstrate this for his pupils. Heading 
experiences may be more valuable for theme material than “personal 
experiences.” The building of choi'actcr should be a very conscious 
aim in composition teaching. Classic models arc well worth imitation. 
Themes should be loss frequent and more discunsivc. 

M. H. Willing. 


12, A Language Text.—Book Three of the Cowan, Betz, Chai’teis* 
language aeries is much more practical in its subject-matter for junior 
. high school use than moat of its kind. It stresses functional grammar 
and ordinary good usage. Tests, reviews, and drill exercises are 
numerous, but, as must always be the ease under book limitations, 
not nearly numerous enough and not nearly effective enough in 
design. 

M. H. Willing. 


J.'. Translatoi, Robbins, P.: “My Class in Composition, A Teacher's 
Diary.” CambtidgQ, HarvardDaweraity ProsB, 1023, pp. xxxi + 268. 

* Cowan, E. M., Botz A, and Clmrtora, \V. W.: “EsBcntinl Laiigiuigo Ihibits," 
New York, Silver, Burdett and Company, 1023, pp. viii + ‘139. 
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ON THE IMPROVEMENT IN INTELLIGENCE 
SCORES FROM FOURTEEN TO EIGHTEEN* 

E. L. THORNDIKE 

Institute of Educational Roaoaroh, Toaohera College, Columbia University 

Adults mcosurGd by the iutelligcnce tests in common use attain 
scores little if any above those made by 14-year-old children in school; 
and this baa given riao to the conclusion that in general intelligence 
does not improve after about that age. On the other hand, whenever 
repeated measurements have been made over an interval of a year or 
more upon the same iuclivickmls initially 14 or 16, there has been a 
marked improvement. So in the case of Woolley ('14 and '16), Brooks 
(^21), Cobb ('22), and others. The data from repeated measurements 
need, however, some allowanco for the epecial practice in taking the 
tests themselves, and the amount of this has not been known with 
surety. 

We have been able to make very extensive measurements a year 
apart and to measure tho allowance to be made for the special practice 
adequately, in the case of the sort of person who goes to high school and 
stays there for at least a year and a half. 

In May, 1922, 4473 pupils in GradeIX,4304in Grade Xand 3644 in 
Grade XI in various high schools were tested with an examination 
representing a composite of recognized group tests of intelligence. 
In^May, 1923 the pupils in Grades X, XI, and XII of the same schools 
wore tested with an alteruaio form of the same examination. 2790 + 
3136 + 2638 of tlio original 4473 + 4304 -h 3544 were among those 
measured in 1923, representing (except for temporary absences or 
removals to other cities) those who had continued in school a year. 

Schools 4-16 had Form A of the examination in 1922 and Form. B 
in 1923; schools 21-26 had Form B in 1922 and Form A in 1923. In 

* The investigation reported here was made possible by a grant from the 
Commonwealth Fund. 
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general the median gain for ’23 over ’22 for pupils who took both exam¬ 
inations is 26.4 ± ,48 (PE) when the or{ler was A-^B and 19.8‘ ± m 
(PB) when the order was B->A. If we assume that the real going of 
pupils in schools 21-26 were equal to the real gains of pupils in schools 
4^16^ B is 3.3 points (±.41) easier than A. 

An independent method of equating tliom is by comparing the 
scores of pupils taking Examination A and Examination B, in both 
cases as the first trial. In 1923 in schools 4 to 15 there were many 
pupils tested with Examination B in Grades XI and XII who by reason 
o£ absence or change of residence had not taken Form A, the 1922 
examination. On the average we may expect the Grade X pupils 
who thus took Examination B as their first trial in 1923 to be of nearly 
equal intellectual ability^ with the Grade X pupils in 1922 who took 
Examination A. The data available appear in Table I. Weighting 
the B^A differences by the smaller of the two populations involved we 
have as the average. Weighting by the square root of the smaller 
of the two populations involved we have 1.1 as the average. We 


may take 


12.26 -f* 
14 


2 


or 1.7 as a conservative probable error for it. 


Table I.— Median Scores in Form A and Form B op Pdpilb Taking the 
Examination as a First Trial 


School 

Grade 

Form A 

n 

Form B 

n 

B - A 

9 

IX 

UOH 

large 

151 

163 


IT 

IX 

169 


mSEM 

98 

1 

4 to 11 

X 

178 

large 


104 

-2 

4 to 11 

XI 

106 

large 


86 


12 

X 

194 

large 

212 

131 

18 

12 

XI 

206 

large 

199 

86 

-6 

13 

X 

ISO 

largo 

197 

69 

8 

13 

XI 

204H 

large 

200 

60 

-4H 

14 

X 

176 

large 

163 

78 

-12 

14 

XI 

182 

largo 

186 . 

53 

3 

16 

X 

196 

large 

185 

38 

-11 

16 

XI 

206H 

large 

166 

20 

-31M 

21 to 26 

IX 

164 

173 

160K 

largo 

m 

21 to 26 

X 

190 

744 

188H 

largo 

-IH 

21 to 26 

XI 

163 

267 

190 

large 

0 


‘ Moving from one city to another is probably indifforont to intelleot; being 
absent from sobool is perhaps slightly negatively related to intelleot. 
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B then may be taken to be about points easier than A. 
Subtracting 2}4 from each individual’s gain who took the tests in the 
order and adding 2}4 to the gwn of each individual who used 
the order B—jA we have 22)4 as the median gain and 23 as the average 
gain. 

A certain amount of the gain made during the year is probably 
due to the special practice with the examination itself, although 
fore-exeroiso was given in both ’22 and ’23 to reduce this practice 
effect. We have measured it as follows: We have, os just stated, 
many pupils in Grades X, XI and XII in 1923 in schools 4-16 who 
took Examination B as their first trial, by reason of absence in 1922. 
They are presumably on the average nearly equal in intellect to those 
who were present both in *22 and *23 and so took Examination B as 
their second trial. The average dififcrence in favor of the latter may 
then be taken os the result of the first trial’s experience. Similarly 
for schools 21-25 with Examination A. We have made the 
computations with the results shown in Table II. 

To have the amount duo to the growth and training from May 22 
to May 23 without this special practice effect of second over first 
trial we then subtract 11.9 from 22ji or 23, leaving 10.6 or II.1. 


Tadlb II.— Median Scoiina of Pupils Taking the Same Foum op the 
Examination ab FinsT Trial and ab Second Trial 


Schools 

Grade 

Form 

4-11 

X 

B 

4-11 

XI 

B 

4-11 

XII 

B 

12-16 

X 

B 

12-61 

XI 

B 

12-6 

XII 

B 

21-26 

X 

A 

21-26 

XI 

A 

21-26 

XII 

A 


Average, 11. 


First trial 

Second 

trial 

Advantage 
of second 
trial 

176.0 

184.6 

8.6 

200.4 

216.0 

14.6 

207.2 

222.8 

16.8 

104.0 

208.3 

13.7 

106.3 

210,3 

16,0 

216.6+ 

203.8- 

14.2 

100.3 

100.0 

-.3 

103.3 

213.6 

20.2 

213.7 

219.3 

6.0 


D ±PE1 
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What this average gain of H.l in a year amounts to may be realized 
from the fact that it is about one-third of the mean square deviatioa 
of individuals in Grades IX, X, and XI, in first-trial score with the 
examination or about one-half of what tlioir mean square doviatioo 
would be if they wore porfectly measuTcdv Since tho variability of 
the high school population in Grades IX to XI may bo estimated to 
be at least half that of all U-year-olds, and since tho mean square 
deviation of all Id-year-oIds may be estimated oa 23 months of mental 
age, measured by Stanford Binot or about 21 months if perfectly 
measured, our gain may bo sob os equivalent to at least 10 months of 
mental age around 14. It is thus a gain of considerable magnitude. 

The gain is very closely the same for pupils in Grade IX, in Grade 
X, and in Grade XI, in 1922, the modians being 22.4, 23.6 and 23.4 for 
the entire gain, or 10.6,11.7, and 11.6 if 11.9 is subtracted from each 
for the effect of special practice with the tests. Any doorcase in gain 
with age, if such there bo, is offset by tho selection of those 
more capable of gain. 
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the relative predictive values of certain 

INTELLIGENCE AND EDUCATION^ TESTS 
TOGETHER WITH A STUDY OP THE 
EFFECT OP EDUCATIONAL 
ACHIEVEMENT UPON 
INTELLIGENCE 
TEST SCORES 

ARTHUR I. GATES AND JESSIE LA SALLE‘ 

Toaohora Cbdogo, Columbia University 


StJMMABT 

This study is based upon 75 pupils tested during two school years 
at intervals of four months with a battery of achievement tests, and 
twice at an interval of 12 months with the Stanford-Binet and the 
National IntelligencQ Teats. The main results are as follows: 

1. The obtained correlations betw<^n the National Intelligenoe 
Tests and achievement tests, and the intercorrelations of achievement 
tests decrease steadily as the intervals increase from 0 to 20 months. 

2. The Stanford-Binot predicts achievement 20 months later about 

as well as for shorter periods. ^ 

3. When oorrootiona for the unreliability of tests are made, it 
appears that the best moans of predicting achievement in a particular 
subject, reading, spelling or arithmetic, is to use a test of that subject 
itself. 

4. Tho usefulness of an intelligence test, because of the universality 
of its currency, is indicated, however. 

5. That the National Intelligence Test lefleots in a measure the 
effects of information and skill progressively accumulated in school is 
indicated by a positive correlation between gains in National Intelli¬ 
gence Test and gains in achievement during a period of 12 months. 

G. That the Stanford-Binet reflects slightly, or not at all, the effects 
of schooling under the conditions of the experiment is indicated by 
zero correlations between gains. 

7. An analysis by means of the correlation of columns of a 
correlation table (Spearman's criterion) and by means of partial corre¬ 
lations agree in suggesting that intelligence is not a quality, everywhere 
one and the same, but a composite or average of many abilities 
variously related. 

' Based on data secured in the Soarborough School, Scarborough, N, Y. during 
tho academic years 1020-1022. 
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The Geneiial Problem and Mkthods 

The intelligence test and scale, the invention of which is accredited 
to Binet, and tests and scales of scholastic achievement which 
originated with Thorndike, have enjoyed a parallel and, until recently, 
a somewhat independent growth. The group testa of intolUgeuce 
developed out of the superior technique of the iicliievomont tests and 
embraced in no small measure identical content. Nevertheless, it 
has been quite commonly assumed that inteUigonco teste measure one 
type of thing, namely, native ability or inborn capacity, whereas the 
scholastic testa measure another thing—school achievement, acquired 
ability. The wide use of the accomplishment ratio la substantial 
testimony of this fact. 

This paper presents an inquiry into several pertinent relations 
between typical representatives of three types of tests, the individual 
Stanford-Binet Intelligeuoo Seale, the National (group) Intelligence 
Test and several group tests of soholastio achiovoraent. 

The function of the intolUgenoe teat is prediotiom The fact that 
mtelligence teats give substantial correlations with achievement in 
Bohool Bubjeots measured at the same fact to which nearly all 

of our studies are oonfined^ia alone InsufRciout justlBoation for the 
..use of the intelligenoe tests. It may be that future success in school 
subjects is predicted more ooouratoly by soholaatic tests themselves 
than by intelligence tests. This possibility, at any rate, will be one 
o{:the subjects of the present inquiry—tho relative prediotivo value of 
intelligenoe and eduoational tests. 

The subjects utilized were pupils mainly from Grades III, IV, "V 
and VI of the Scarborough School in the academio year 1020-21 who 
also completed all of the work during the year 1921-22, about 75 in all. 
They were given the Stanford-Binet once in each year, mainly during 
the first semester; the National Intelligenoe Test once in October of 
each year, and a battery of te^ in reading comprehoDBion (Thorndike- 
McCall); reading rate (two forms of tho Courtis or Burgess or one of 
each); arithmetic (all four forms of tho Woody) and spelling (60 words 
from the Ayres Scale). The achievement tests wore given in October, 
January, and May of each year, i.e, tests at intervals of approximately 
four months during two school years. From these data, it will be 
possible to compute tho oorrelations between tests given at the same 
time or at intervals of 4,8,12,16 or 20 months. 

The National Intelligence Test (hereafter designated by NIT) 
given in 1920 was correlated with six tests of comprehension in reading, 



619 


Predictive Value of Certain Tests 

one in October, 1920; January, May, October, 1921; January and May 
1922, and similarly with six tests each for rate of reading, arithmetic 
and spelling, making a total of 24 coefficients. There were 24 coeffi- 
oients, also, between the NIT given in 1921, and the several scholastic 
tests, or 48 for both. Similarly there were 48 r’s between Stanford- 
Binet Mental Ages (MA) and the educational tests. The inter- 
cor relations of the achievement tests are more numerous—36 
between each - group of six tests for each subject, making in all 432 
coefficients. 

Since the group embraced the range of ages found in four grades, 
the correlations will bo relatively high, but since the children them¬ 
selves in tliis school wore selected, few having an IQ of less than 100, 
the coefficients will, by this means, be reduced. What the coefficients 
would be in the oose of an unselected group of the same age measured 
by perfectly reliable instruments could be estimated only at the cost 
of great labor and then only roughly. At present, the method of 
correlation serves its purpose most fully in portraying relative associa¬ 
tions. This may be accomplished in the present study with a high 
degree of reliability since the same subjects, and no others, are used 
throughout. The reader will understand that the correlations 
presented below mean little except as one is compared with others 
obtained in this study, and even thfe statement demands certain 
qualifications which will be made in due time. 

Predictions for Varied Periods of Time. —^The first question is 
concerned with the influence of an interval of time between tests upon 
the correlations. In Tables I, 11 and III are given the correlations 
arranged according to the intervals between tests. Table IV is a 
summary of Table I, II and III. 

Table I,— CoimBLATioNS op Natiokal iNTBLuaaNOB Test, October, 1920 wxth 


Scholastic Tests 


Interval in months 

0 

4 

1 

8 

12 

1 

20 


.8S 

.82 

.80 

.82 

,73 

.72 


.83 

.82 

.82 

.82 

.76 

'.72 


.01 

,80 

.91 

.91 

.83 

.78 

Spelling. 

.86 

.78 1 

.82 

.84 

.84 

.80 


.868 

.828 

,838 

.848 

.790 

.755 
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ConiiBLATioNB or National Intbluokwcb Tbbt, Octodeh, 1021 with 
S cnouLfiTic 'feaTS 



Table II.—ConnBLATioiifl op MA 1020 with Hoiolahtio TEtira 


Interval in montlia 

0 

4 

8 

12 

16 

20 

Heading oomproboneion.. 

, .74 

.73 1 

.78 

.72 

.71 

.76 

Reading rate. 

.66 


.00 

MilM 

1 .06 

.08 

Arithmetlo. 

.77 

.33 1 

.86 


1 .8>i 

.83 

Spelling. 

.00 

.63 ' 

.61 

■■ 


.56 

Mean..... 

.60 

.723 

.726 

.718 

.703 

.713 


COBBSLATIONfl OP MA 1021 WITH SCIIOLABTIC TeBTB ‘ 


Interval in inenthe 

0 

4 

8 

12 


Heading compreheneion. 

.78 

.72 

.72 

.77 





.70 



Heading rato. 



.00 

.03 





.06 



Arithmetlo. 

.83 

.88 

.70 

.72 




.83 

.81 



Spalling. 

.02 

,60 

.68 

.60 




.80 

.00 



Mean.. 

.716 

.717 

.710 

.078 



‘ Not all of the Binet Testa wero given in the samo month; tlicy were grouped in 
the modal month, and IQ’s multiplied by CA's of that month to give the MA. 
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TAutB III.—CoJmELATiON OF IIbadino CoMPnEiiBKBioN wmr RBADiffG Rate 
Intkuvai.h IN' Months 



Kamo ttmo' 

■1 nionlhs 1 

1 

8 nionllis 

12 montliB 

10 montlia 

1 

20 months 


.09 I 

.70 

,70 1 

.70 

.70 

1 1 

.74 1 

.71 1 



.7« 

,77 

.«» 1 

.70 

.71 

.GO 1 

.08 1 



.7A 

.70 

,HO 1 

.00 

.71 

.00 

.00 ' 



.77 ! 

.OR 

.70 1 

.00 1 

,78 

.07 ' 




,07 

.00 

.00 ' 



.02 




.73 

... 

... 


,., 

.71 




.732 ! 

.711 

1 .700 1 

1 .087 

.068 

.636 

PB.I 

I .011 , 

.011 

.012 

.Oil 1 

.011 1 

.010 

_ __ .... 

L 


. 

-- 

-- 





CorreUtionii of ItcndiitK Comprohonaion wil1i Arithmotla 


.70 

.If, 

.76 

.8-1 

.86 

.78 

.72 

,76 

.70 

.74 

.73 

.80 

,78 

.81 

.83 

.71 

.74 

.86 

.72 

.73 

.78 

.70 

.70 

.70 

. . . 

.74 

,66 

.73 

.70 

.76 

.84 

■ 

.05 

.71 

,703 

.701 

.703 

.740 

.713 

.03 

.000 

.008 

.011 

.013 

.013 

.014 


Corrclftliotu of RcmIIiir Comprolionalon wltli SpolllnR 


.72 

.71 

.70 

.70 

.70 

,70 

.00 

.00 

.73 

.70 

.72 

.78 

.01 

,70 

.00 

.08 

.74 

.72 

.71 

.60 

.73 

,00 

.70 

.80 

.00 

.70 

.03 

.72 

.00 

.00 


.70 

.61 

.61 



.06 



.70 





.03 



.733 

.708 

.71)1 

.080 

.078 

.07 

.010 

.008 

.010 

.006 

.010 

.006 


Corcoimfons of Ifoatlfng lUfo with ArlthmoCio 



.78 

.66 

.75 

.73 

.70 

.72 

.02 

.60 


.OH 

.70 

.7.1 

.71 

.70 

.01 

.63 

.08 


.00 

.07 

.05 

.03 

.08 

.57 

.73 



.76 

.03 

.71 

.58 

.70 

.71 

.07 



.73 

.09 

.03 



.71 




.70 





.08 



Mean. 

.72.6 

.088 

,578 

.072 

.038 

,02 

PB. 

.016 

.101 

.011 

.017 

.026 

.020 

Correlntloiu of licaOing Italo with Spoiling 


.76 

.60 

.72 

.72 

.72 

• 

,72 

.73 

.00 

.03 


.78 

.73 

■ 7(i 

.71 

.73 

.6(1 

.68 

.01 

.08 


.72 

.70 

.77 

.60 

.70 

.63 

.65 

.73 



.711 

.67 

.71 

,00 

.07 



.00 



.83 

.08 

.08 








.03 









Msao. 

.735 

.705 

.70 

,070 

.003 

.055 

PB. 

.012 

.007 

.013 

.010 

.015 

.013 
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Table HI.— Cflnh'nuwi. 

ConelnLlont of Arllhmello vltii Bpelling 



.73 

.71 

.78 

.72 

.78 

.78 

.80 

.04 


.81 

,80 

.77 

.71 

.72 

.70 

.73 

.70 


.82 

.88 

.71 

.00 

.81 

.84 

.70 



.72 

.72 

.80 

.72 

.08 

,72 

.68 



.70 

.73 

.78 



.70 




.77 

... 

... 

... 

... 

.00 




.718 

.730 

.714 

.70 

.70 

.07 

PE. 

.012 

.000 

.000 

.006 

.013 

.011 


Table IV.— Somuaiiy op Tables I, II and HI 


Showing tho average corrolntions of tcaU indicalod in firsl column at the loft with 
Iho individual achiovoment testa 



0 

months 

4 

months 

8 

raonlha 

12 

months 

10 

months 

20 

monlha 

Natiooal latolligonoe 
Teat.. 


.828 

.824 

.820 

1 

.766 

MA. 


.720 


■a 


.713 

Comprohonaion. 


.729 

.724 

■a 

mm 

.602 

Rate. 




.070 


.037 

Arithmetic. 

.764 

.727 

.718 

.707 

■a 

.060 

Spoiling....;. 

1 .737 

.714 


.080 

.080 

.005 


A survey of the detailed data or the summary will disclose one 
important fact: All testa except tho Stanford-Biuct show a gradual 
and fairly uniform decrease in tho correlations with achievement as 
the teats become more widely separated in time. While it starts with 
a markedly lower correlation than the NIT and slightly lower than the 
single scholastic test, MA "holds up" better—in fact, it maintains a 
level of association. It correlates with achievement 20 months 
hence quite as well as with achievement at the time it was given. 

Despite the steady decrease in tho correlations between NIT and 
achievement with the increase in the intervals between tho testSi the 
NIT shows at 20 months a correlation with those subjects that is higher 
than that given by the MA. Judging from tho general trend, however, 
it appears that the former may, sooner or later, descend to or below 
the predictive value of tho latter. This possibility, however attrac¬ 
tively implied in the data, should not be assumed as certainty. 
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Any single achievement teat gives a very fair correlation with 
achievement in other functions, on the average, although not as good, 
after 16 months or more, as either of the intelligence tests. If, how¬ 
ever, wc make up a kind of imitation NIT by combining the scores of 
all achievement tests, and correlate them with single measures of 
achievement, as was done both witii NIT and MA, we secure an equally 
good predictive instrument. A sampling of the results thus obtained 
are as follows: 


Taulb V.— ConnBLATioN op Fouh Equally Weiqutbd Educational Tests 
WITH SiNOLR Acuibvemrnt Tbsts SELncvrED AT Random 


Same limo 

4 motilliB 

1 

8 monlhs 

12 months 

16 months 

20 months 

.86 

.80 

.80 


.83 

.79 

.88 

.85 

.87 

IH 

.80 

.75 


Those r’s are too few to yield very reliable results, but they suggest 
correlations at least ns high as those given by NIT and higher than 
those by MA. 

How Well Does an Ackiovcmenl Test Predict Itself? —The next 
consideration is: How well does reading or arithraetio predict itself? 
Does arithmetic predict itself as well as the MA, NIT or a composite of 
several school subjects predict it? The data are given in Table VI. 

It should be noted that the self-correlations are based on tests 
given at intervals of four months or more, since but one test was given 
at a time for each function. In comparing the self-correlations with 
the intercorrelations of Table IV, the first columns in the latter table 
should be disregarded. 

A comparison of the means of Table IV and the means of Table VI 
will disclose the fact that a test in comprehension or rate of reading, 
arithmetic or spoiling will predict future achievement in the same 
function bettor tlmn any one will predict future attainments in any 
other function—the average of the aelf-correlationa is .826 compared 
to .702 for the inUrcorrclalions of the several tests, or averaging the 
self-correlations of tost separated by 20 months, we obtained a 
mean coofliciont of ,802 as compared to a similar mean of the 
intercorrelations of .065 

The tests of comprehension and rate of reading here used, however, 
predict themselves 16 or 20 months later slightly if at all better than 
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Table VI.— Showing tub SELr-connsLATioNs yon tub Sbvbral Scholastic 

Tbsth 


Rending coinprohonsiou willt roedtng 1 lltsnliiig mlfl with rending 
comprehension j rnlo 


Intervals bolwcon tests in mottihs 

InlervnlH biitwccn IchIs In inontlig 

4 

8 

12 

10 


4 

8 

12 

10 

20 

' 

.77 

.78 

.70 

.70 

.76 1 

.80 

.SO 

.82 

.78 

.74 

.70 

.74 

.78 

.70 


,78 

.77 

.00 

.07 


.80 

.80 

.77 


i 

.80 

.73 

.60 



.Yfl 

.77 


WM 


.74 

.78 




.80 

.... 


HH 

H 

.80 1 





Mean .70 

.77 

.76 

.70 

.76 

.80 

.77 i 

.73 

EM 

.74 

PE .010 

.008 

.014 

0 

H 

.012 

.009 

.02-1 


0 


Arithmotio with arlllimotlo Bpolling with niioiling 


4 , 

8 

12 

16 

20 

4 

8 

12 

10 

20 

:oi 

n 

.01 

.89 

.84 

,01 

.03 

.88 

.91 

.88 

.88' 

is 

.80 

.80 


.00 

.88 

.87 

.85 


:- I , .01 

mm 

.89 

IHH 

B 

Kg 

.91 

M 



,, , .89 - 


■HH 

mu 

B 


.80 




.06 

m 

M 


m 

m 





Mean ,91 

.88 

.80 

.88 

.84 

.00 

.00 

.87 

.88 

.88 

PE .011 

.006 

,008 

.008 

0 

.004 

.010 

.006 

.014 

0 


th& Nit or the Stanford-Binet (or a domposito of all of tho educational 
teste), blit the test of arithmetio prodiots itself slightly, and tho test of 
spelling appreoially moro aoourately than tho iutolligonco tests, ns 
shown in the ooofQoients lepoatod below in Table YII. 

Under the conditions of the present study, tho following practical 
reoommendfltions concerning the use of the tests lioro studied would be 
(\i‘ follow•<: 

Ir, lo prediot.futuxe scholastic oohievoment in general for a period 
of, 16 nionths or less use the KIT or a oomposite score obtained by 
combining the results of the educational tests. 
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VII 



1 

10 montliB 

20 months 



mm 

mm 

■Pi 





,70 

.66 

.73 

.72 

.68 

.74 





.83 

.84 

.88 

,78 

.83 

.84 





.84 

.01 

.88 

.80 

.60 

,88 





2. To prodicl; future achievement in general at a period of 20 
months, the Stanford-Binet and the NIT are equally good. Probably 
but not certainly for periods more than 20 months later the Binet will 
yield the higher correlation. 

3. To predict achievement 16 or 20 months later in arithmetio or 
spelling specifically, use the achievement tests themselves. For 
predicting comprehension in or rate of reading, the specific reading 
tests are no better than the NIT and but little better than the 
Stanford-Binet. 

4. The Stanford-Binet differs from the other teats in showing no 
decrease in correlations with educational achievements, in general, as 
the intervals between tests increase from zero to 20 months. 

The Correlations between Several Tests Rendered Equalin Reliability. 
Tho correlations used in the preceding discussions wore those 
actually obtEiincd. The magnitudes of those correlations depend 
not only upon tho intimacy of tho associations of the functions but also 
upon tho accuracy with which tho functions are measured by the 
particular teals used. Ollier things being equal, the more reliable the 
test the higher the correlation, If wo wish, therefore, to ascertain how 
closely intolHgenco, as rcprosentccl in these tests, correlates with the 
type of reading, arithmetic, or spelling tested, or how well these types 
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of aohiovomont correlate with each other, wo muat make allowances 
for the differences in reliability of the instruments; wc must make them 
equally reliable. 

If a tost wore entirely reliable, two forms of it should give the same 
relative positions when applied with a short interval between to tlio 
same group of aub)ecl8 A‘®*» the correlations between tbs two tests 
would be 1.00. Insofar oa this self correlation is loss than unity, the 
tests are relatively unreliable. Tho aimplcst way, statistically, to make 
all our testa equally reliable is to make certain corrections which make 
them perjeclly reliable, i.e., to yield a self correlation of 100. The 
device for this purpose is the formula for correction for attenuation.^ 
To correct for attenuation, it is necessary first to secure tho correla¬ 
tions of a form of a tost with another form, or the same form, given ab a 
brief interval—a correlation usually called the reliability coefficient. In 
this investigation, tho shortest interval between two iesU of tho same 
function was four months, followed by others at four-month Intervals. 
What tho correlation would bo between two tests given at shorter 
intervals or ab approximately zero interval must be estimated. Tliis 
was done by computing the average decrease tn tho self correlations 
for a four months poriod by comparing the sovoral intervals 4-8 
months, 8-12, etc, This average amount has been added to tho aver¬ 
age of the self oorrelations at a four-month interval, giving us, at least 
approximately, the soli correlatiou of tests given in immediate succes¬ 
sion, that is, the roUabiUty ooofBcienl, which is given below ior each 


test. 

IlBt.IAOlfclTT CJOBWCIBKTH 

Reading, comprehoaflion.80 

Reading, ralo.•.81 

Arithmetic.yn 

Spelling.. . ,00 


For the intelligence tests, lacking crucial data but taking into 
account the results of certain other studios, the actual self-correlations 
at an interval of a year (.93 in both cases) are accepted as 
Approximately correct reliability coeffioionts. 

In Table VIII are given the correlations whicli would bo obtained 
if all of the tests were equally relmblo. In column (1) are given tho 
coxieoted correlations for tests given at an interval of 10 months; in 
column (2) at an interval of 20 months, and in tho last column the 
mean of both periods, to seoure tho most reliable coefficients. 

‘This is discussed by T. L. Kelly In his "StatUtioal Method," Now York: 
Maemillaa 1923, pp, 204f. 
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Tablb VIII.—C'onRELAT[ONB Cohrectbd Fon Unreliadilitv op Mbascres 


ItilervalB 

CD 

1C inotilha 

(2) 

20 months 

Average of 
(1) and (2) 

nit with comprclicnaioii. 

.866 

.870 

.900 

.020 

.888 

.843 

.833 

.846 

.877 

.850 

.840 

.860 

.873 

.898 

.860 

nit with arithmetic. 



MA witli coinprclieiiBioii. 

.831 

.762 

.022 

.009 

.704 

.878 

.787 

.911 

.040 

.800 

.864 

.709 

.916 

.667 

.799 

MA with arithmetic. 

MAwilh apolling. 

Moan of al)ovo. 

ComprohoiiBiou with roinprchcRsion. 

.059 

.901 

.064 

.074 

.047 

.947 

.013 

.022 

.974 

.030 

.053 

.907 

.938 

.974 

.043 

ArithmoUo with arithmetic. 

Spelling with Biwlling. 

Mcnnofnl>ovO. . 


With perfectly reliable lOHts, correlations are obtained which 
differ in certain respects from those actually obtained, as shown in 
Tables IV and VI. The NIT shows correlations of nearly equal mag¬ 
nitude with the four school subjects whereas the Binet predicts most 
accurately achievement in arithmetic, with comprehension a close 
second, rate third and spoUing a low fourth. The corrected coefficients 
also show that the several functions, had the instruments of measure¬ 
ment been perfect, would predict themselves 16 or 20 months later 
nearly equally well. A.t the twenty-month interval, the self correlations 
are already higher than the predictions made by NIT which are, in 
turn, very slightly higher than those of MA. 

With bettor instruments for measuring achievement, then, the 
tost of a school Hul)icot itself is the best predictive measure of future 

achiovemontiu thtttHubjeet. ^ 

That measures of ability in a school subject are excellent indications 
of future acliicvcments is a fact (although by no moans a now or even a 
recent fact) of utmost importance in the understanding of the princi¬ 
ples underlying the attempts to measure general and specific native 
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aptitudes or capacities. Those who have ciMailed the intelligcace 
tests, insisting that, ainco most of them measure, aa they do, in various 
degrees the actual scholaetic aUainmonlfl of inrlividiinls, they cannot 
represent native capacity nor bo used for prediction, have not givcD 
due weight to the fact that under certain coiiditioiis demonstrable 
ability is a very oxcellcnt index of capacity. In connection with the 
present results, it should ho staled that the permanence of aptitude 
in these fiabjects, especially readinK and spelling, was given a very 
severe trial inasmuch as throughout these two sclmol years, very 
special attention was given to those retarded,^ Their difficulties 
wore the subjects of prolonged study; the retarded pupils received 
unusual renaedial Ireatment lavolviug greater time and attention than 
were devoted to otheca. The relative positions of the pupils wore not - 
greatly changed however, after two school years of teaching which 
was, as a matter of fact, aimed at changing tlicm. 

itemarka, then, that intelligeuco tests merely measure “education" 
or “acquired intelligonce,” if corrjdng an implication that they are 
therefore not indicative of native capacity, miss the point. Under 
conditions of approximate educational equality, acliieveincnt is a most 
oxoellent symptom of native oapooity. Wo know native capacity, and 
measure it, as we knew and measure electricity, gravity or typhoid 
fever; we know it from certain symptoms; wo mensuro it by measuring 
the symptoms after knowing tho correlations of symptoms with the 
capacity, form of energy, or disease. 

Why Use Inlelligence Twfsf—Wo way now raise the question; 
Why use intelligence tests, since achievement tests predict future 
attainments under the conditions of tho present study os well ns tho 
intelligence testa? 

In the first place, due weight mu&t be given to the conditions of the 
present investigation. The data were sooured only from those pupils 
who were in the same school, who had taken all of tho tests, wore sub¬ 
ject to tho same methods of toaohing, the same distribution of time and 
attention to the several subjects, and wore held to similar ideals of 
abhievement. Under such couditiouB the prophesies from achieve¬ 
ment teats are maximum. If wo had selected pupils from various 
schools, with varied curricula, varied standards of achiovemonts, 
vamd grade placements of material, varied timo nllotmcnta, varied 

* As doscrihed m one of the 'Nrltors* monogtapha. "Tho Tayobology of Read; 
ing and Spelling \ 7 ith Spooial Referonco to Diaabillly.*' Teachers College- 
Bnreau of PubHofttiona, 1922, 
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merits among toachorH, the correlations with future achievement of the 
pupils would probably be lower than those which appear in this study. 
In other words, the achievement teat may be expected to predict 
future achievements with a high degree of fidelity only when all of the 
pupils have l)een subjected to the same, or at least similar, 
educational treatment. 

Another, and perhaps more important matter, is the equality of 
experience with the tests. Wo know that repeated testing with the 
group tests usually rcaults in an increase in the scores. When all of 
the individuals have been given the same number of tests at the same 
time, the scoros will increase but this will not greatly influence the cor¬ 
relations. If, on the contrary, a group is composed of some who have 
never beon tested before, others previously tested one, two, etc., 
times, the resulting scores will probably have less predictive value since 
each score depends in an appreciable measure upon previous experience 
with the test. 

The Stanford-Binet is, like any other test, susceptible to practice 
effects.* If tests are repeated at a wide interval—say a year—this 
effect is not groat. For the 76 pupils reteste d after an interval of about 
a year, the average gain in IQ was 4*2.2 points. Similar results have 
been found by others.* That the Stanford-Binet is susceptible to 
practice effects transferred in any groat amounts from experience with 
group tests is unlikely judging from results elsewhere published.* 
Practice influences, t)}on, for the Stanford-Binet can probably be easily 
ascertained and allowed for. 

The infl\icnco of educational attainments, school information and 
skill, presents a more difiicult problem. Over this influence many 
disputes have been, and still are, waged. There are two convictions, 
the first of whicl\, shared by many, is here given in the words of Ter- 
man: “ There is no reason to believe that ordinary differences such 
as those obtained among uiiselectcd children attending the same 
general typo of school in a civilized community, affect to any great 
extent the validity of Iho scale." In contrast to this view, is one 

‘ Aa aliown moat oloarly by Kathoriiio Graves in an unpublished Doctor's 
DiBBorliiUon, 'I'cachorH College Library. 

* See a summary Ijy Xlugg and Collolon: Conalancy of the Stanford-Binot IQ 
nsshown byPe-tests, Journal of EducationalPaychology, 1021, Vol.XII,pp. 316-332 
mul by BnUlwiu and Slneljcr, Mental Growth Curve of Normal and Superior 
Children, Vnircraily of Iowa I^iudies, Vol. II, No. 1,1922. 

* Galea, A. I.: The Ileliabihty of MA and IQ Based on Group Testa of General 
Mental Ability, /ournai Applied Psychology, March, 1923. 
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expressed by Cyril Burt;^ “There can be little doubt that with the 
Binet-Simon Seale, a child's mental age is a measure not only of the 
amount of intelligence with which he is congenitally endowed, not only 
of the plane of intelligence at which in the course of life and growth he 
has eventually arrived; it is also an index, largely, if not mainly, of the 
mass of scholastic information and skill which, by virtue of attendance 
more or less regular, by dint of instruction more or less effectivo, he 
has progressively accumulated in Bchool." More specifically, Burt 
estimates that of the gross mental age, one-niutn is attributable inde¬ 
pendently to age, one-third to native intellectual endowment, and over 
one-half to school attainments. 

The practical significance of the Binct Test depends upon which, 
if either, of these sharply contrasting views ia correct. How may 
the facts be determined? 

Burt undertook a solution in the following manner. Tor about 300 
pupils of ages between 7 and 14, he secured the chronological age, 
Binet Mental Age and education age by means of tests, the results 
of which were revised by the teachers. Between scholastic attain¬ 
ments and mental ago was an obtained correlation of 0.01. 

The task, now, is to explain the cause of tliis correlation. Hoes the 
observed correspondence result from an influence of intelligence upon 
achievement, or is it the reverse; is the Bind Mental Age determined 
“largely, if not mainly, by the mass of scholastic information and skill 
—progressively accumulated in school,” or are both related to other 
factors which influence them similarly if not equally? 

The group of pupils displays a wide range of ages, for one thing, 
and since both mental ago and school achievement increase with ago, 
part of the observed correlation is due to this factor. Correlating both 
mental age and school attainments with ago, it is possible, by the 
technique of partial oorrelation to eliminate tlie influence of age. 
The residual or “partial” correlation of mental age and school attain¬ 
ment with age eliminated, ia +0,68, a substantial correlation. Wo arc 
left, however, without an inkling as to the cause of this relation. 

Burt went one step farther. Ho attemjitod to oliininato pure 
intelligence from the correlation. If this could be done, the residual 
or partial correlation might fairly bo taken to indicate the influence of 
scholastic achievement on performance in the Binct test. But to 
eliminate native intelligence it was, of course, necessary first of all to 
have a measure of it. Burt accepted as a criterion of intelligence, 


‘ “Mental and Soholaatio Tests," London: P. S, King, 1021, p. 182. 
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ratings obtained from his Reasoning Tests, “the results of whioh were 
revised by the teachers/^ Assuming that this was an index of native 
intellect, by proper statistical methods, Burt obtained the follow¬ 
ing rogreseions which "indicato the relative proportions in which the 
three factors—ago, intelligence, and school attainments—together 
determine a child’s achievement in the Binefc-Simon Tests.” Theequa- 
tion is: MA » .64 school attainment + .33 intelligence H- .11 age. It 
is on this equation that Burt's statement, earlier made, was based. 

The validity of Burt’s conclusions depend entirely on the validity 
of his criterion of pure intelligence. Pew would be willing to admit, 
on the basis of evidence that Burt has offered, that the reasoning tests 
or that teachers judgments or both together may be properly used as a 
criterion by moans of which the Binet may be evaluated. Until a test 
known to be reliable and also a valid measure of native intellectual 
capacity shall have been devised, it will be impossible to estimate the 
influence of education in the easy manner whioh Burt has utilized. 

Some of the materia! collected in the present study affords a check, 
even if a very rough one, on Burt^s hypothesis. Selecting children 
tested by the Binet twice, at anintervalof approximately twelvemonths, 
it is possible to compare the growth in mental age (or change in IQ) 
with the advancement in educational attainments. If it is true, as 
Burt contends, that the score on the Binet test is “largely, if not 
mainly” due to “information and skill—accumulated in school,” it 
should follow that those children who make relatively greateducational 
progress sliould tend to make relatively groat advancement on the 
mental tests. It may be illuminating also to compare gains in scores 
on the NIT, ns well as on the Binet, with incroascB in achievement in 
school subjects. 

For the Binet, gains in MA and IQ, for the NIT gains in raw scores, 
and for the four education tests, gains in terms of 'scaled’ units, i.e., 
units equal at any point on the scale, were computed. It will be 
advisable to make comparisons with the same grades since to compare 
gains at difTcront levels, especially in cose of the intelligence tests, is a 
questionable practice, 

The central tendency of tho association of gains in either intelli¬ 
gence tost witli gains in tho various subjects is, of course, depicted by 
the cocflicients of correlation. They are sliown in Tablo IX. 

In Tablo IX it appears that advancement in scholastic achieve¬ 
ment exerts a mild influence on scores in NIT. Surveying the several 
grades, it appears that rate of reading has the greatest and most con- 
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Taslb IX 


Correlation of gains in NIT with gains in 



Read¬ 

ing 

rate 

Read¬ 

ing 

oomproi 

honsion 

Arilh- 

inulic 



; MA 

IQ 

1 

1 

Grade III. 


-.04 



1 

.26 ' 

.28 ^ 

.20 

Grade IV. 

mSm 

.10 



.09 

.03 

.05 

Grade V. 

.33 

.20 


-.07 

-.04 

.00 

.02 

Grade VI. 

,10 

.38 

.22 

.00 

-.18 

.06 

-.00 

Mean. 

.23 

.176 

.06 

.04 

.03 

.106 

.07 


Correlation o! gams in MA with gains in 
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ing rate 

Reading 
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honslon 
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Arilh- 
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j woody 
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ing 

Writ¬ 

ing 

1 

! IQ 

Grade III. 

-.08 



m 

■ 

.965 

Grade IV. 

.03 

.04' 

■la 

BH 

B9 

.887 

Grade V. 

.00 1 

HQ 

.00 



.86 

Grade VI. 

-.03 

HI 

.12 

-.10 

-.12 

.01 

/Mean. 

-.02 

.06 

.03 

-.04 

-.03 

.90 


Correlation of gains in IQ with gains in 


Grade III,....... 

Grade IV. 

Grade V.1 

Grade VI...! 

-.04 

-.08 

.08 

.06 

.02 

.16 

.03 

-.04 

-.13 

.21 

-,03 

-.16 

■ 

■ 

Mean. 

,00 

.04 

-,03 

.05 

-.06 


sistent effect, yielding an average correlation of 0.23. Tlic PIO of this 
coefficient is approximately .07, or less than one-third, honco the 
chances are 24 to 1 that the correlation is not due to the inadequacy of 
the data.*- 

‘ For any mraga correlation of .14 or more, or for any single coofliciont of ,22 or 
more, the chances are 4 (or more) to 1 that it is significant of a gonuine association. 
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Study of the depciidonco of the correlations on the grade diacloees 
several tcudencicfl of intcrcHt. Tho iiiQucnco of writing is appreoioblo 
in Grade lU, scarcely so in IV, end lends to be negative in V and VI. 
This result is quiU‘ in accord with our knowledge of individual children 
vrho wrote with [mlnful alowncss at tho beginning of the year when the 
first tests wcro given iu Clrade HI. Tlio antno was true of reading rate, 
the influence of whicli is a factor in higher grades os well. An extreme 
case in Grade III, a child backward in both reading and writing, 
scored 35 on the first NIT, and ti year later J38. Heading corapre- 
hension shows a change quite the reverse of writing; the correlation, 
approximately zero in Grade III, rises to .38 in Grade VI. Spelling 
and arithmetic, except for the r of .22 for the latter in Grade VI, show 
very low correlations. 

While some of t)io average correlations of scbolastio and NIT 
gains arc too small to satisfy the atatiatical requirements of substantial 
reliability, it is novcrlhclcsa a notable fact that all are positive; reading 
rate, and rdnding comi)rcho.n8ion arc reliably so. All told, they indi¬ 
cate a genuine t)Ut Rhiulcr association which may most probably bo 
mterproted to indicate otie or all of several things: (1) They may be 
duo to tho influence of ^‘scholastic information and skill progressively 
acoumulatod in aohool" ui>on the NIT results; (2) they may indicate 
tho influence of zeal and determination to excel in tests of all kinds; 
(8) they may indicate tlio development of skill, facility, technique in 
tho taking of group tests which may, in a measure, pervade all of tho 
typos hero used. All of these influencos have, we believe, some weight; 
of the three the first probably has the most; tho second, the least influ¬ 
ence. It may thou bo said with some confidence that educational 
advancement does influence achievement on tho NIT. 

Before taking up tho correlations of gains on the Binet Teat 
with gains in ncliiovomenfc, the range of gains on the former should 
be considered. Measured in terms of IQ they were distributed as 
follows: 


IQ ehango | 

'-I?! -in (0 -H j 

1 -J3(o -II 
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—7 to —6 
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1 ^ 
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13 
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tQ olistiBO 
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The average change, aU chaiigea treated without regard to signs, 
is 6.36 points IQ which is somewhat larger than those usually found. 
The averages ot the changes (i.e. p\ttB changes minna minus changes 
divided by n) ia a gain of 2.2 points IQ. The rango of changes is 
unusually largo, thus giving more than usual Bignificanco to any 
comparisons with them. 

Table IX gives the oorrclations of gains both in MA and IQ with 
other gains. It will be observed that the correlations between IQ and 
MA gains are very high—.00 on the avorago. Both show, conso- 
quently, quite similar.ossooiationa with other gains. 

The general impression ot these tables is unmistakable “no 
correlation.’' Few of the single coefficients arc twice tho Probable 
Error; negative values are nearly as frequent as positive values. 
There is no evidence that educational advancement during a school 
year in any lino here represented has any influonco upon tho MA or 
IQ eBTned at the end of tho yew. These data bear witness to the cor¬ 
rectness of Terman’s statement that “There is no reason to believe that 
ordinary differonces, such os those obtiunod among unscloctcd children 
attending tho same general type of school in a civiliisccl community, 
effect to any great extent, the validity ol the scale,’* 

Let It be freely admitted that our data afford no conclusive test 
of Burt’s contentions, with which they surely fail to accord. A year 
la a short interval, the eduoationol treatment of our cases was not os 
extremely differential as may be imagined, measures of differences are 
very unreliable since they partake of tho errors of both mcosurcB from 
which they were obtained, and many factors may all but swamp the 
influence, of education on the IQ. But giving duo weight to all of 
these inadequacies and sources of error, had tho effect of school 
progress on IQ been even moderate, some tendency to positive 
correlation should havo appeared. 

If differences in educational progress do not account for the changes 
in IQ, what factors do? Irregularities, spurts, and retardation in 
niental growth are possible but improbable explanations. More 
likely causes of variations arc errors ia moasuroment, and of these 
there are three types; those due (1) to dcfcota in tho iiioaauring instru¬ 
ment;^ (2) to mistakes and mlsjudgmonts of the oxaminor; and (3) 
those due to variability in human performance. As in moasuring other 

'^Fot example, coarseness ol the stops on Uio aonle—seo Cobb, M. V.: One 
.Element in the Probable Error of a Montal Ago Moasuromout. Journal Ednca- 
lional Psychology, April, 1922. 
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human abilities, results are approximations, more or less close, to real 
ability. Even in nioosuriug speed of running 100 yards, sources of 
error, the samo in type if not in magnitude, are encountered. Stop¬ 
watches are not always perfect; timers are less so, and no athelete can 
run with exactly the same speed on each of several occasions. Both for 
speed of nmiiing and mental ability, approximate measures are 
extremely useful. 

The results of the study of the relations of educational gains and 
changes in scores in mental testa disclose the peculiar value of the 
Staniord-Binet Test. It appears to operate in relative independence 
of the ordinary variations in school attainments. Based on content 
not taught in schools, the Biaet Test is more widely applicable; it 
enables us to make comparisons in disregard of any save extreme 
variations in odacatioual history, formal and informal. The MA- and 
IQ are universal currencies who^ validity, far from perfect, is 
aeverthclesa surprisingly useful, especially for predicting general 
scholastic aohiovemonts over relatively long intervals. 

The group intelligence tests, of the NIT type, when utilized not 
only for immediate classification, but for purposes of future prediction, 
should bo employed advisedly following careful scrutiny of the condi* 
tions which obtained among the pupils to be tested. As Whipple 
has pointed out, • comparisons of results with general norms are usually 
loss significant than comparison with norms established within the test 
situation itself. To compute IQ's and MA’s, and to disregard previous 
test oxporlcnceH of the pupils, is to incur a danger of serious error.^ 
Properly used the NIT, ^nd others similar, are extremely useful 
instruments for jnoa.suring general scholastic capacity—indeed, any 
tost of scholastic achievement has a substantial value for appraising 
general ability, 

The iVaiwre of Factors Which Cause Correlation among School 
Functions .—The last question, more theoretical than the others but not 
without practical implications, concerns the nature of the cause of 
correlations among the te.'its, particularly the teats of achievement. 
Tho corrolatiouH ostabUshed between the several school functions 
(See Tabic III) may he attributed conceivably to common psychologi- 

‘Whipplo, Cl. M.; Tho Natural Intolligcnco Testa. Journal Edwalional 
Ilctcardt, Juno, 1021. 

* Tlio onorjnouB oriors Lo which carolcas practice will lead is indicated in Gates, 
A. I.: Tho Unreliability of MA n«d IQ based on Group Teats of General Mental 
Ability. J'oumal of Applied Psychology, March, 1923. 
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cal factors. Thus the high correlation between reafling rate and read¬ 
ing comprehension, or between reading rate and arithnietie, may be 
most simply oxplamcd by n^utnlng that each shares Identical alnlidca 
with the other. But should the term be singular or plural? JIave we 
in each cose a common factor, one and the same, Hhuved in dilTorent 
amounts by different pairs of subjects? Or one to recognize n multi¬ 
plicity of abilities, some or all of which are found in varying amounts 
in different aubjocta? 

These questions raise au old dispute in whicli Spearman is recog¬ 
nized 08 the leading defender of tho single common factor belief and 
Thorndike as the exponent of the multiplicity of factors doctrine. 
The dispute has not been settled partly because of insullicient crucial 
data and, perhaps mainly, because them is not comploto accord os to 
just what constitutes "sufficient proof” for either theory. Various 
criteria have been proposed; few accept any ono as satisfactory. Tho 
most that may be done is to appraise tho present data by several 
of the criteria which stand in reasonably high repute. 

As a first step, ovidonco may be offered in opposition lo the notion 
thatourintelligonco tests measure a oommou mental capacity or general 
intelligence conceived as a unitary factor, complololy and oxchiaively, 
and that the various linguistlo and abstract school subjects depend 
mainly or entirely upon this sarao general capacity. If this were true 
in the extreme, arithmetic or any other subject, would not predict 
itself, as it demonstrably does, better than would a test of intelligence 
or- a test of some other subject. Furthermore, the two intolllgeucc 
tests are not correlated with each other pprfootly, but to tho extent, 

0.86, which is tho average of four correlations onoh corrected for 
unreliability of the measures. When ago, which contributes to this 
correlation, is eliminated by partial correlation,^ the r becomes 0.711. 
Finally, the two tests give not only correlations of difforont magnitudes 
with scholastic tests, in the average, but they give diffoieut rela¬ 
tive correlations with the several subjects as shown, for example, in 
Table HI, 

These considerations have a bearing on certain practices, particu¬ 
larly the Acoompliahment Katio procedure, whicli sets up an intelli¬ 
gence rating as the criterion of achievement. It is apparent that, at 
present, this procedure is valid only for rough appraisals; and more 
valid for evaluating general educational achievement by the consolicla- 

, * By means of the usual formula r 12.3 = by Viilc. 
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tion of scvcrul PuifjncLs lliaii for gauging the attainment in a particular 
subject. 

Theflc corihideralioim innwnueli as they are conBned to and limited 
by the validity of Dio iKirticuIar toHta used, boar but slightly, if at all, 
on the general problmn of tlic nature of tlio factors which cause correlo- 
tiona among mental nbilitioR. The theoretical common intellectual 
factors may be very imperfectly roprcsenled by any of the intelligence 
tests now in use. 

Spearman has HUggested, as a test of his theory, the closeness with 
which a serica of intercorfclfttions of Jnental abilities approximate a 
hierarchy whicli, he fussumes, would be obtained if his theory were 
correct. The theory demands that if the correlations of a number of 
mental functions are arranged in a dcBceiuling order from left to right 
and from top to (lottom, aa is usually done in a table of intcrcorrelatious, 
in every row and (ii every column, the correlations should be in the 
same deaconding order. This means that in Die table of inter-conela- 
tiona, each column will show a perfect correlation with every other 
column: 

To conslrucl such a table, the inlcrcorrolations between each pair 
of educational teats were averaged. Each mean coefficient, which 
is the average of 30 seiinrate correlations, lias boon corrected for atten¬ 
uation, They are arranged in Table X. 

TaULB X.— SllOWlKO THE MbAM iKTBRCOnnBtATIONB (BaOU COBPFIOIBNT TUB 

AvanAOB or 30 r's) CJojihected ron Attenuation and Absanoed to Show 

THE Prebbncb on Adbbncbs or a HiBHAiicny. The Numdeiw in 
niiACKETR Show tub Hanks of r’s in tub Columns 



1 h 

C’oinprchcn- 

sioti 

2, 

Arithinotic 

3, 

Hntc 

4, 

Spelling 



.877(1) 

,803(1) 

.815(2) 



,776(3) 1 

,778(3) 



.818(1) 

4 SpolJiiiK.,... 

,818(2) 

1 




Tim correlalioiiH tunong tlie columns arc not perfect; indeed, they 
give the apponraiicc of a, chance orrniigcincnt. The raiik.s, reading 
from Die loft to Dio riglifc colurany arc: 1,2,3; 1, 3, 2; I, 3, 2; 2, 3, 1. 
The suggestion, tlion, is that the correlations of these tests are not due 
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to a factor, everywhere one and the same, but to the presence of many 
factors which variously combined make up different functions. 

Spearman’s view is tliat mental funolions embrace two kinds of 
factors, a general factor slmred by all in different amounts thus 
causing correlation and n speciho factor or factors rarely shared by 
two or more different functions. Thus, it should follow that by elimi¬ 
nating from the correlation beUveen 1 and 2 the association duo to a 
third function, which gives about equal correlations with them, the 
result would be zero, ov approximately zero, correlationv By means of 
partial correlation it is possible to make such eliminations which we 
have done with reaulta as aUown in Table XI. 


Table XI 

Pnrtial Corr<riation8 First Order 

Comprohonalon with arithmotie (rate oHininalod) *» 0.70 

Comprehonslcn with arithmotio (spoiling eliminated) " 0.67 
Comprohension with rate (anthmoUe oHminatod) » 0.04 

ComprehenaloB with rate (epottmg oUminatod) *» 0.60 

Comprehension with spoiling (ariUirooUo eliminated) » o.48 
Comprehension with spoiling (rate oliminaled) » 0.41 

Arithmotio with rate (coiaprolionMon ollmlDalod) « 0.04 

Arithmetio with rato (spelling eliminated) <"0.83 

Arithmetic with apclUng (coinpiohoneion oUmlnatcd) *» 0.23 
Arithmotio with spoiling (rate oliininatod) « 0.40 

Kate with spoiling (comproboDsion oliininatod) » 0.41 

Bate with spoiling (arithmotio eilmlnatod) » 0.50 


Partial Corrolalions Second Ordor 

Comprehension with oritbmolio (ralo and spelling oliininatod) »> 0.03 

Comprehension with rate (arithmotio and spoiling eliminated) « 0.51 

Comprehension with spoiling (arilbmotic and rale eliminated) "> 0.23 

Arithmetio with rate (comprehension and spoiling oliminatod) » —0.05 
Arithmetio with spelling (comprohension and rnlo eliminated) » 0.23 

Rate with spelling (comprehension and arithmotio oliininatod) » 0.41 

There are common elements, among tlieso functions, as tho dccrcnso 
in the partial coefficients show, but the relations aro such os Huggesb not 
the presence of a single common clement but many elements variously 
shared by different functions. Thus comprehension and aritlunotic 
show a substantial correlation when the factors common to l)oth rate 
and spelling have been removed; rato of reading and spelling are 
correlated (r = 0.41) by factors independent of comprohension and 
arithmetio. There is apparently no single common factor running 
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through all of those tesU which accounts in a complete way for the 
relations among them. Apparently we must conceive mental ability 
as a multitude of specific abilities of which a largo number are active 
during each mental act—Horao common tomany, some tofew. General 
intelligence is thes sum of such a multitude; as actually measured by 
tests it is the sum of aHiimpUngof the whole. These are the implications 
of the data which give no conclusive result. Although the coefficients 
ore highly reliable, the functions tested arc too few and the relations 
to factors, educational and other, here uncontrolled, are too complex 
to make possible more limn a suggcBliou of the facts. 



A STUDY OF THE RELATION BETWEEN ABILITY 
TO LEARN AND INTELLIGENCE AS MEASURED 

BY TESTS 

O. J. JOHNSON 

Division of Rcaonrcli, Public Sclioola, Si. Pniil, MinnoHOlu 

One of the fields of invostigftUoii yet Hcai'coly touched is 
the relation between intelligence and efficiency of learning. If group 
inteUigetiGe examinations measure the kind of tmita required in learn¬ 
ing, it should bo determined under cxi>orimental conditions how 
important the abilities in question arc for various types of material 
and under different conditions. Such knowledge would be of great 
help to educational psychology in supplementing data already at hand 
on the relationship between mental ability and scholarship as measured 
either by marks or scores on achievement tests. It would also help to 
clear up points in dispute which retard the advance of intelligence 
testing and would direct such activities into more fruitful channels. 

Tho experiment reported below is an attempt in the indieatod 
direotion. It deals with tho acquisition of new habits in tho case of 
material which was already {amiliar; io., in learning to read inverted 
print in a mirror. Tho subjects wore CO university students in tho 
writers olasa in educational psychology; 12 were men and 48 were 
women. The experiment, carried on as part of the course, was 
conducted as follows: 

All praotioo was done out of school boui'S. Tlin studonb wa.s 
directed to place a good mirror directly in front of himself so that it 
faced him. The book used was Starch’s "Educational Psychology" 
and this stood right side up on the table so tliat it faced tho mirror. 
The student's task was to read the page by looking at its inverted 
image in the mirror. A day’s practice consisted of 10 minutes work 
and keeping careful record of the number of words rend. Tho experi¬ 
ment continued during 20 days of practice. 

At first the reading was found to be very confusing. It was iiece.^- 
sary to make a number of adjustments which conliiotod with long- 
established habits. Not only was tho print inverted, but linos had to 
be read from right to left. Itwasalso very hard to distinguish between 
such letters as "p” and "q," "d” and "b," and "a" and "s," not so 
much on account of the difficulty of seeing them os of tho mental 
confusion momentarily occurring. 

During the course of tho oxperzmont a number of group 
intelligence tests were given to the class. They were Army Exauii- 
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nation Alpha, Form 8; Thurstone Psychological Examination, Test 
IV; Haggerty Iteofiing Examination, Sigma 3; Van Wageaeu Asso¬ 
ciation A Tcfita* niu( two unpu!)Ii8hcd forms of Geometrical Figures 
Testa devised by the wriUir. The overage score on all of these tests 
formed the mental ability standing of each student. 

At the conclusion of the 20 days of practice, correlations wore 
figured between the overage scorea on all the testa and performanco 
m mirror reading. This wan done twice, first with the average number 
of words per cloy, and Mcond with the improvement in ability to read. 
For tills, the difTerence between the average number of words read 
during the first three days and last three days of practice was used as 
the raeaauro. The correlations were os follows: 

"With average nvunber of words read per day .34 ± .08 
With improvement in ability to read.46 ± .07 

These correlations were calculated according to the Pearson 
Product-Moments Formula. They show that there exists a fairly 
large poaitivo relation between the ability to become efficient at learn¬ 
ing to read inverted print and intelligence os it is measured by the 
usual group teats. It is interesting to note that it is not tbe absolute 
amount whicli a person reads that is most important in this connection, 
but rather the amount of improvoment. lu other words, it is rapidity 
of learning—acquiring new connections—that is most closely related 
to mental ability. 

In order to attack the problem in a somewhat different way, the 
curve of improvement was drawn for ail 60 students as a group. In 
Fig. 1 this curve is roprosented by the heavy line in the middle. A 
glance at this curve indicates that the average increase in ability to 
read was regular from day to day; furlliormore that the students had 
not reached their limit of improvement when the experiment ended. 

The data for thcotliQr curves in Fig. 1 were scoured os follows: 

Figure I showb the performance in mirror reading for different 
groups of n university class arranged according to their mental 
abilitios as indicated liy test results. 

Curve 1 is for the 30 students who scored above the class average 
in the tests. 

Curve 2 is for the 30 nludonls scoring bolow the average class. 

Curve 3 is for lliG 15 highest students. 

Curve 4 is for the 15 lowest students. 

'Van Wagonon, M- J.s Graded Opposites and Analogies Teats. Journal of 
Bducalional Pigchologji, Vol. XI, Mny-Juno, 1920. 
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The heavy central curve is for the total group of 00 students 
composing the class. 

The students were divided into two groups of 30 i)erHon8 each. 
One group was composed of the students standing above the class 
average in the intolligonco tests. Thcit curve is labeled with the 
figure Ir Similarly curve 2 shows the improvomeut of the students 
who fell below average in the tests. 

In studying those curves, ono would say that the relation between 
mental ability and ofEoiency in learning to read is fairly close, booauao 
it is evident that the students who stood above the average in intelli¬ 
gence started with greater initial performance in mirror reading and 



that they progressed considerably faster. While this is true when 
students are considered as groups, individual progress records show 
that there are large variations among those who made very nearly the 
same scores on the mental tests. Further treatment of tho data will 
bring out these facts more clearly and confirm tho rosulta already 
secured by the method of correlation. 

Selection was next made of tho studonts who ranked in tho upper 
and lower quartiles in the intolligonco tests. It was thought that Llin 
difference between the rates of learning to read would bo greater in tlic 
case of these groups than between the upper and lower halves. How¬ 
ever, curves 3 and 4 in Fig. 1 do not bear out this assumption. The 
strildng thing is the lack of differonce between the reading abilities of 
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tho groupa comp^iBg the upper one-fourth and uppor halt of the class, 
or of the lowest fourth ami lower half. In short, it appears that rather 
wide differciicofl in mental ability do count in work of this type, but 
that small diffcrenwjs arc overbalanced by other factors. 

It was thouglit worth wliilc to check tho results just mentioned in 
yfit another way wlncli if pOHsiblc would slmw atUl more clearly the 
relation with intelligence fw dUtinguinhed from other factors. In this 
case, tho scorca of atudenta were arranged in pairs. That is, 
two sludcnla were solecUul who made identical, or nearly identical, 
scores on the inlclligenoe tests. In this way, it was possible to pair all 
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studenU, making two groups of 30 students each which may, for 
practical purposes, be considered to be equal in intelligence. The 
curves of improvoincnt for theso groups are shown in Fig. 2. 

Figure JI mIiowh the performances in mirror reading of two groups 
of thirty studoiils each. Tlicsc students were paired on the basis of 
equality in inlclligoJico as measured by tho average of several tests. 
This made two grouiw of equal mental abilities. 

If tlio ability to read print in a mirror is to some extent due to intel- 
ligozico, wc filiould expect to find loss divorgonce in performance 
between tho mentally equal groups of Fig, 2 than between the groups 
in Fig, 1, Tliat this is true in general is clearly shown by inspection of 
tliQ graphs. Mental ability undoubtedly enters to a considerable 
extent. If the correlation were perfect, the two groups in Fig. 2 should 
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make identical records throughout; but this is not the coso. One 
group is uniformly superior throughout. This, ono is forced to con¬ 
clude, must be duo to other traits than those measured by the intelli- 
genoe tests. The results are however romarkablo in showing such a 
close correlation, when ono considers the rather mechanical and unin¬ 
teresting nature of the task of learning to road inverted print of a 
difficult thought content. 

Oonclusions 

1. We need carefully conducted experiments to show the relation 
between different types of learning and intelligence. 

2. The results of this study suggest that in any experiment on learn¬ 
ing account should be token of theintclligeneo of subjects before results 
are worked out or conclusions arc drawn. 

3. It suggests one reason for the lack of agreement between results 
of investigations similar in nature, but where the subjects wore 
different. 

4. It raises a question os to the validity of much experimental work 
done in the past in which no attention was paid to the mental abilities 
of the subjects. 



SC/VT.ES FOR MEASURING JUDGMENT OF 
OIICMESTRAD MUSIC 

M. H. TlUnUE 

Director of Uio Bureau of I'AfucaliouiU llesonrcl), Ujuvorsity of North Carolico 

Inaofftr m Hchoolw uiulfrtako to change the musical tastes of their 
pupils, it is cJcsIrable that scales bo available for measuring the extent 
of the changes that occur during any period of training. Seashore 
lias devised testa which meaBUro "Musical Taleat^^ in a number of its 
elementary phnses, and other investigators havo attempted to deviso 
measures for musical intclligenco and for ability to recognise moods in 
music, but as yet no one has reported a test that menaurea ability to 
distinguish Imtwccn good rausio and poor music. Although it is 
important that the school should discover pupils whose talents make it 
possible for thcni to bocome producers of muaic, it is equally important 
that it should measure the extent to 'wiuch its efforts to improve the 
tastes of consumers of muslo arc successful. It was in an offort to 
supply such measuring instrumonls for taste or Judgment in orchestral 
muric that Mr, M. L. Mohler began the present study.^ 

Nature of the MohJer TesU. —^Mr. Mohlor's tests for measuring 
ability to judge orchestral music involvo the use of phonographic 
records of ID dilToroiifc musical comjxjsifcions. The relative merits of 
tiiesc records have been determined and have been assigned numerical 
iodexes by coinbiniiig the judgments of expert musicians with the 
judgmonts of other intelligent adults in a manner to be described 
later. Records regarding whoso merit the experts and the intelligent 
laymen wore not in nubstaniial accord have boon omitted from the 
scales here proposed for general use. The testa ns presented assume, 
therefore, that one record is better than another if it was considered 
better by both the musical experts and the other intelligent persons 
whoiie judgtncids uro reported in the following pages. 


* Mr. Mohler iilamicd Iho Invcatigalion nnd cojnplctccl tho field work in 1920 
under the joint direction of I'rofcsBora Tliomne II. Briggs and Truman L. Kolloy. 
UnforUmiiio eirouniHlftncca jnwontad Iho completion and pubiicntion of the study. 
Tho lOBnlls, Jiowovor, soomed to iJio prpaonl writer eo valiiablo that lie asked and 
received from Mr. Mohler permiBsion to prepiiro tho report for publication. 
Tiio reader will liiorcforo find In tlvo report corlain omissions and impcifcctions 
which are due to iJiefact that tlic work was dona by dilTercnb persons at its various 
stages. 
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The complete list ol phonograph records used in this study and the 
numbers by which they will bo doaignated in this report are 
given below: 


Record number ‘ 

>famo of aclecliou 

Composer 

1 

Valao, from Ballet Musioof . 

Gounod 

2 

Ilunfc in tho Black Forofll. 

Voolkor 

3 

Wedding of Iho Rose. . 

Jossol 

4 

■Un&ttiahod Symphony, Etmt Movomeut . . 

Sohubott 

5 

Sparklota. 

Walter E. Miles 

Q 

Vonltian Lovo Song. 

Novin 

V 

Beer Banco. 

Skilton 

S 

Turkish March, from “Sonata In A Major”. 

Moxart 

d 

How Beautiful Art Thou. 

Bouinconlis 

10 

Cuddles. 

Wm. Ponn 

11 

Intormerzo No, 2. 

Wolf Ferrari 

12 

Largo, from “New World Symphony”. 

Dvorak 

13 

Dance of tho Gobllos.. 

Loraino 

14 

Sounds from tho Mueio Room. 

Smith 

16 

Triumphal Mwoh, from “Alda". 

Vonli 

16 

Introduotlon, AotIII, “Lohongrin”. 

Wagner 

17 

Cortigo du flntdar. 

Ivanov 

18 

Chaconne... 

Durand 

19 

Anitra's Banco, from “Poor Oynt Suite”.... 

Grieg 

20 

Nightingale and Frogs. 

Eilonborg 

21 



22 







1 These numbers are (or oonvoniont rttferonoo only. Thoy have no relation 
whatever (0 the quality of the rausio. 


The tests are arranged and administered in such a manner that a 
person who can detect small difTerooces in the general merits of two 
selections will receive a high score, while one who can detect only the 
larger differences will receive a lower score. Three or four records 
are played in a group, one record immodiately after tho other. ISach 
listener is asked, when a group of records has boon played, to make a 
note indicating which record he oonsidorod “boat," which "next boat," 
and which “poorest.” Tho difforoncos in musical values of tho records 
in the first group in each teat are relatively large, while tho diffcroncca 
in ^each succeeding group are smaller and smaller. Tho number of 
points allowed as a score for succesafully rating each group of records 
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is dependent upon the size of the errors in one's judgmentj no credit 
at all being given if the person has rated the records in exactly the 



HoaIo Oruntnn 
Polallvo MubIcdI VnluM 


Fio. 2. 


Softlo Doltft 

Bclntivo Muocnl Viiluoa 


reverse order from that which is considered correct, and maximum 
orcclit being given when the correct order has been accurately indi¬ 
cated. A low score tliereforo indicates little or no success in judging 
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the relative merits of these orchestral selections, while a high score 
indioates a greater degree of success, 

The order in which the records ore to be played In giving a test 
and the calculated musical values of each record arc given in Tables I 
and 11 and in Figures 1 and 2. Figure 1 and Table I give the arrange- 
ment of Scales Alpha and Beta, while Figui-c 2 and Table II give the 
arrangement of Seales Gamma and Delta. Scalo Alpha and Scale 
Beta each consists of four groups, each group containing four records, 
while Scale Gamma and Scale Delta each consist of five groups, each 
group containing three records. In the diagrams, distance to the right 
indicates increasingly high quality, while the distance from the top 
indicates the order in which the records ere to be played. In Group 
A of Scale Alpha, for example, the second record is the best and the 
third record played is the poorest. For each group in Scale Alpha 
there is a corresponding group in Soalo Beta, composod of difTcrent 
records but having the same general arrangement and intervals of 
quality. Similarly, for each group in Scale Gamma there is a 
corresponding group of records in Scalo Delta. 

Table I,—Rblativb Valtibs and Intbsyalb DETwaBN Valdbs in Soalbe Alpha 

AND BBTA 
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TviflliB IL— liELATIVB VaLEBA AND iHTOnVAU DETWKBN ValdHS IN SCALE«? 

Qauma and Delta 


Group 

Kccortl 

llolativo 

Interval j 

Group 

Buconi 

Relative 

Interval 

number 

iiumhcr 

riunlily 

A j 

numl^r 

number 

quality 

a 

Al 

A 


-2.65 ; 

Al 

18 

+ .66 

-2.63 

A2 

20 

-1.36 

+1.35 , 

A2 

2 

-1.08 

+1,08 

A3 

12 

0 

H.20 1 

A3 

5 

- .00 

+1.45 

B1 

18 


- .00 ' 

Ul 

•1 

+1.20 

- .02 

Q2 

n 

- .36 

- .82 

B2 

10 

+ .28 

- .76 

D3 

0 

-1.17 

+1.72 

Ba 

8 

- ,48 

+1.08 


2 


+ .00 

Ct 

0 

-1,17 

+ .56 

C2 


-1.38 

+ -60 

C2 

IG 

- .61 

+ .61 

C3 


- .70 

-1.10 

C3 

12 

0 

-1.17 


5 


+ .00 

Dl 

20 

-1.36 

+ .80 

D2 

12 


- .4« 

D2 

3 

- .66 

- ,48 

D3 

8 

- .48 

- .42 

D3 

17 

-1.03 

- .32 

El 

Ifi 

- .36 

- .20 

Et 

10 

- .70 

- ,38 

E2 


- .01 

- .42 

E2 

0 

-1.17 

- .21 

E3 

17 

-1.03 

+ .0H 

K3 

13 

-1.38 

+ .60 


iS'corfls in the Tests .—Scvoml schemes for obtaining a numerical 
score were tested, and the one finally selected and recommended for 
general use is neither the amplest nor the most accurate possible. 
It is, however, a coinpromiso which involves both of these merits in a 
satisfactory degroo. In any given group of records played, the only 
judgments that are allowed to count in the determination of one's 
score are his ratings of the best record and of the poorest. This 
method fails to employ all of the information available, especially in 
Scales Alpha and Beta where each group has a "next best” and a 
"next poorest" record. An actual trial of scoring methods based on 
ratings of all tho iccords ij\ a group fmled, however, to increase per¬ 
ceptibly the coe/Ticiont of correlation between two measurements of 
the same group of persons. The present scheme was therefore adopted 
as being reasonably accurate and yet simple enough for practical 
purposes. 

The method for obtaining a score can best be explained by illustra¬ 
tions. If the relative values of three records in Scale Gamma are in 













56D 


The Journal oj Educational PBj/chology 

the order 1-2-3 from the poorest to boat, and a Hatenor rates them in 
1-2-3 order, ho is to be given 4 points credit for that group—2 points 
for having judged the best record correctly, and 2 points for having 
judged the poorest one correctly. If the record of intermediate value 
ia judged "beat,” however, while the beat one is rated as “middle/* 

2 points are to be given for having located the poorest correctly, and 1 
point for having made only a one place error in rating the beat record. 
In short, 2 points of credit are to be allowed for placing a beat record 
(or a poorest record) in its correct poation with reference to the other 
two, 1 point for placing it in the position next to its rightful one, and 
no credit at all if it ia placed at the wrong end of the group. The 
maximum credit for each of the five groups in Scale Gamma (or Scale 
Delta) is therefore 4 points. It the first record wore poorest and the 
third one beat in a group, a 1-2-3 rating would roceivo 4 points credit, 
a 1-3-2 rating 3 points; a 3-1-2 rating 1 point, and a 3-2-1 rating no 
oredit at all. 

The method of scoring for Scales Alpha and Beta is similar to that 
for Gamma and Delta in that only the ratings of the best and of the 
poorest records are considered. Three points are to be allowed for 
having rated tho best record as “best,*' two points for having rated it 
“next best/’ one point for having rated it “next to poorest,“ and no 
credit for having judged it “ poorest.” Tho maximum credit foreach of 
the four groups in Scale Alpha (or Scale Beta) is therefore 6 points, If 
the correct order in a group wore l-BS-4, from poorest to best, then a 
6 point oredit would be granted for a rating of a 6 point credit 

for a rating, a 4 point oredit for a l-Jf-X-X or an 

rating, a 2 point oredit for an X-l-^-X, a 1 point credit for a /^-X-l~X 
rating, and no oredit for a J^-X-X-l rating. 

The maximum possible score in Scale Alpha or in Scale Beta ia thus 
seen to be 24 points, while the maximum in Scale Gamma or in Scale 
Delta is 20 points. For praoticol purposes one may consider Scales 
Alpha and Beta as equivalent to each other. Scales Gamma and Delta 
make up another pair, differing from Alpha and Beta but comparable 
with each other. No table has yet been worked out for converting 
Alpha or Beta scores into equivalent Gamma or Delta scores, or vice 
versa. It is recommended that Gamma and Delta bo used where 
possible, because of the greater ease with which people arc able to j udge 
the relative merits of three selootions rather than four at once. 


^Th© X in these series stands lor either of tho two recordshaving values above 
the poorest but below the best in the group. 
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Sixty pupils in Grade VI of the public schools at Hackensack, 
Ken' Jersey, were Icslod with a preliminary form of Scale Beta. 
Their acorcs ranged from 3 to 22, with a median value of 10.8 and 
a gorai-iiiterquariUe range of 3,2 points. Two hundred and thirty 
pupils in high hcIiooIh of New Jewoy and New York City were 
tested with preliminary forms of Alpha and Beta. Thoir scores 
ranged from 4 to 23, with a median value at 12.9 and a somi-interquar- 
lile range of 3.4 points. Theac scores aro not satisfactory standards 
for the present forma of Alpha and Beta, but they are given here as 
indications of approximately what may be expected from the use of 
these scales. 

Beales Gamma and Delta have been employed in their present form 
in testing a few sludenU in North Carolina. The results obtained in 
the schools of Greensboro arc given in Table HI. 


TaBLBIII.*—SCOJIER IN JUDOMBOT OP OuCUJSBTHAI/M uSIC AT OnEBNSDOnO, N. C,‘ 
8cale« Oamma and Doita 


Typfl oi Acbwl j 

Vublie clemniiUrr 

N. C.C. W. TrfllDlo* 


Pubtio 

Uigh 


School Krcilc | 

s 

0 

r 1 
1 

S 

8 

7 

I 

ir 

III 

IV 

Kumber of puplbi. . .| 

0.1 

87 

310 1 

90 

31 

2i 

103 

m 

41 

Ifi 

Mcdlih ceuro.i 

8.1 

u.o 

0.7 

0.8 

wM 

12.0 

11.4 

inn 

n.o 

12.1 

0.1 

•2.0 

l.K 

m 

1.8 

H 

l.O 

2.2 

n 

l.O 

i.a 


> Tb«re Are only mvch sraUn (n lUe olomenUry seliool In Nerlh CoroUna. 


It is intcrcHling to note timt the pupils in the Training School of the 
North Carolina (.'ollego for Women had higher average scores than the 
pupils of the saiuu grades in the public schools where less attempt had 
been made to develop musical appreciation. 

Table IV gives the scores of such college students as the writer has 
thus far been able to measure with Scales Gamma and Delta. In 


Tauj,b IV.—.Sr-imKH of C’om/Eok Stoj>bnt0 in Musical Jupoment 
(ScaloH (raminn and Delta 


(‘{/lleftn 

N. O. Collcvo for woniQJi 

UalvoraUy 
of N. C. 

C’lUA 

1 rrcahiiion | 

Soptiofflortw 1 

Jtiitiora 1 

Sopliomoroa 


1 

»0 1 

10 

08 

46 


lO.P 

H,0 

11,0 

12.0 

Q.1 

1.0 i 

2.3 

2.3 

2,3 
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general these college etuclenta are not superior to Die Greenaboro High 
School students, bub this is probably because they have had no more 
training in music appreciation than is olTered by the Greensboro 
High School. Elementary school pupils living near Philadelphia, 
New York, Boston, or other musical centers would undoubtedly do 
better on thcBO teats than North Carolina college BtudeutB, who as a 
rule have had very few musical experiences. 

Effect oj Training .—^The most Bignificant finding of tliis study was 
that the characteristics measured by these tests arc easily improved 
by training. With testa of a similar diameter in judging English 
poetry, it has been found that troining has relatively little elTect on suc¬ 
cess in the tests,^ In the cose of orchestral music, however, there 
seems to be an unusually large opportunity for the improvement of 
taste through musical training and experiences. 

In Greensboro, N. C., the supervisor of music lust year held a 
Music Memory Contest. One Grade VI class went into special train¬ 
ing for tho event and won the prlsso. When tested this year with 
Scale Gamma, this Grade VII class made a median score of 14,1, with 
a Q of 2.0. This scoro is not only superior to the scoros of other 
seventh grades in the city (0.7) but is higher than any group of high 
school or college students in the state have thus far made. Only two 
or three of the selections used in Scale Gamma wore in tho list of com¬ 
positions that had been studied in preparation for tho Music Moinory 
Contest. 

A controlled c.xix'yirncnl was planned and conducted by Mr. Mohlcr 
to «h'f(:Tmi:i(?tl!‘i I'luc.'tof a short periodof training on ability in judging 
nubic ability being measured by the preliminary forms of Scales 
Mphii and Beta. 'I'wo groups of pupils in the same grade of the same 
sphoQl were tested by Scale Alpha (or Soalo Beta) to make certain that 
they had approximately the eamo sooros. Ono of the groups was then 
given a aeries of eight lessons in music appreciation, while tho other 
group was given no special training in music. Tho lessons, which 
required 40 minutes each week for eight weeks, were carefully planned 
and administered, care being taken to avoid playing or discussing tho 
selections used in tho test. At tho end of tho training period, both 
the trained group and the untrained group were again measured by 
the scale previously used. Table V shows the median scores and somi- 
Interquartile ranges for both trained and untrained groups in three 

'Abbott, Allan and Trabue, M. R.: A Mooauro of Ability to Judge Poetry. 
Teachers College Record, Yol, XXII, No. 2, March, 1921, 
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different communities where ti» experiment was performed, and the 
differences between those measures befoio and after training. 


Tab^b V.—ErKKCT OF Traiwiko os Scorbs in MesroAL JnoaMBKT 


School Riid group 

Scald 

Hret test 

Second teat 

Differ¬ 

ence 

in 

median 

Diffei- 

onoe 

in 

Q 

Median 

Q 

Median 

Cl 

H«€k«ns««k High School 




m 

■ 



26 pupitfl, unlrainod. 


12.3 

2.4 



+ .7 

+1.1 

16 pupils, tminod. 



1.7 


|V 

+10.0 

“ -2 

Hackoosack Sixth Grade 





im 



30 pupils, untrained. 


10.7 

KWl 

11.0 


+ .3 

+ .2 

80 pupils trniiicd.. 

Bota 

U.O 

2.8 

10.0 

1.3 

+ 8,0 

-1.6 

Suirnnltt High School 








21 pupils, untminod. 

Beta 

10.3 

w 


m 

+ 2,2 

0 

17 pupils, trained. 

Bota 

10.5 

2.1 

18.8 

2.1 

+ 8.3 

0 

80 pupils, untrained. 

Alpha 

16.3 


10,0 

2.8 

- .3 

- ,6 

40 pupils, trairto<l. 

Alpha 

14.7 


20.6 

1.8 

+ 6.8 

-2.3 

Horace M&nn High School 








U pupils, untrninod. 

Alpha 

14.7 

igWil 

14.5 

6.1 

- .2 

+3.1 

U pupils, trained. 

Alpha 

16.5 

3.2 

ID.7 

1.6 

+ 3.2 

■-1.7 


In almost every case, the group that received training increased its 
roediati score 10 or more times as much as the control group. It is 
significant also that the control groups tended to increase their disper¬ 
sion or variability while the trained groups decreased their measures of 
dispersion. The smallest amount of improvement in any group 
trained was at the Horace Mann High School in New York City 
where the pupil.s liad a relatively high score when the experiment 
started. If one accepts ns fairly valid the relative values used here for 
the records in those scales, it is clear that a well arranged course in 
listening to music can in a short time work a groat improvement in 
the accuracy of pupils’ judgments of orchestral solections. 

Dorivalion of the »S'caic8.-—The fundamental assumption that formed 
tlio httflifl of tile muncrical values attached to eacli record was the 
theorem used by Killogaa in developing his English Composition Scale, 
by Murdoch in preparing her Scales for Hand Sowing, and by others in 
still other fields, that "Equally often noticed differences are equal, 
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unless never noticed or always noticed.*' If 60 per cent of tho judges 
thought record A better than record B, whilo the other 60 per cent 
thought B better than A, one could fairiy certainly assume that the 
two were approximately e^ual in m^t. H 60 per cent thought A 
better than B, and 60 per cent thought B better than C, then one 
could assume that A was just as much better than B as B was better 
than C. 

By accepting tho useful assumptions that judges will in judging 
each record be distributed symmetrically about tho true value 
according to the normal surface of frequency, and that the dispersion 
of the judgments on one record will be equivalent to the dispersion on 
any other, it was possible, by tho help of statistical tables, to convert 
any percentage of "bettor" judgments into a numerical statement of 
the amount of difference between the two records, tho difference 
being stated in terms of some measure of dispersion or variability of 
judgments as a unit. For the scales in judging music, tho PE, or 
median deviation from tho median, was ohoaon os tho unit by which 
to measure the differences in quality between the records. 

Of 368 intelligent persona who listened to record number 1 and 
record number 2 played in the same group, for example, 329 persona 
(89.4 per cent) judged that number 1 was better than number 2, whilo 
the other 39 decided that number 2 was bolter than number 1, When 
89.4 per cent of a large group of judges decide that one record is bettor 
than another, one may assume that, for these particular judges at least, 
the one is distinctly better than the other. It is at this point that the 
table of values of the normal probability integral corresponding to 
.values of X/PE becomes useful. Such a table shows that if 80.4 per 
cent of the judges think number 1 better than number 2, then number 1 
is for them 1.86 FE better than number 2. 

In a similar manner the other two records of this group of four 
were evaluated by means of the distribution of judgments on record 
number 1. Record number 8 was considered poorer than number 1 
by 60.6 per cent of the group and was therefore rated os .4 PE poorer, 
while record number 4 was judged poorer than number 1 by only 30.7 
per cent of the group, which indicated that it was .76 PE bettor than 
number 1. As measured by the dfetribution for record numbor 1, 
therefore, the poorest record in that group was number 2, winch was 
1.86 PB below number 1. The next was number 3, which was .4 PE 
below number 1, and the best was number 4, which was .76 PE above 
number 1. 
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The differences between the four records in each of the other groups 
were similarly determined and stated in PE units. Each set of four 
records played ns one group contained one of the records played in 
the first or second group, which made it possible to relate the values 
of records in any group to the values of records in any other group. 
The final value assigned to each record shows its average relation to 
record number 12, which was chosen as a general reference point 
beoawflo of its location near the center of the values for the “really 
good selootions.” 

In actual practice the calculation of values was more complicated 
than the above discussion would inrlicate. In the first place thcj'e 
were two seta of judges. The expert judges agreed , more closely 
among themsolvcH, which resulted in a shorter PE,which in turn caused 
the differences between sdootlons to have larger indexes when 
measured by the distribution of export judgments than when measured 
by the distribution of ordinary judgments. This difference between 
the two groups of judges was met by taking aa a final value the arith¬ 
metic mean or average of the two determinations. Record number 16, 
for example, was judged by the experts os .68 PE below the reference 
point (number 12) and by the other judges aa .02 PE below this point. 
The final value assigned to number 16 was therefore — .86 PE from 
number 12. 

For both types of judges and within each group of four records 
played together, four distributions were available for measuring the 
differences between the four records, and these did not agree perfectly 
even in their indications of tho order ia which the records should be 
ranked. Tho distribution of non-expert judges for record number 17, 
for example, indicated that the correct order from poorest to best in 
that group was 17-3-16-15, while the distribution for record number 
10 indioated a 17-16-15-3 order, and the other two distributions 
each indicated 17-16-3-16 as the correct order. It is needless to add 
that the amount by which No. 16 differed from No. 17 had at 
least three different solutions. 

In order to provide a uniform rule for action it was decided 
to mcasuro tl\o difforonco between any two records in a group primarily 
by tho evidence of tho distribution for tho two records concerned. 
Assuming for example, that number 16 was next in quality above 
number 17, tho direct comparison showed that 64.4 per cent considered 
number 16 tho bettor, and that one might therefore rate it as ,56 
PE hotter, so far as these two distributions were concerned. But a 



556 The Journal of Edtualiotial Psychology 

similar direct comparison showed number 16 to be .67 PE better than 
number 17, and comparison of number 16 and number 16 showed num¬ 
ber 16 to be ,10 PE better than number 16. Another useful measure 
of the difference between number 17 and number 10 was therefore 
indirectly available by subtracting .10 PE (the distance from number 
16 to number 16) from .67 PE (the distance from number 17 up to 
number 16), the result being .67 PE. Inasimilarmanner,bysubtract¬ 
ing the difference between number 16 and number 3 (.24 PE) from that 
between number 17 and number 3 (.40 PE), still another useful indirect 
measure (.26 PE) was found of the superiority of number 16 over 
number 17. Twice as much weight was arbitrarily given in each case 
to the direct measurement as was given to each of the other two 
determinations just described. The final measure of the interval from 
number 17 to number 10 was therefore ,48 PB|(2 X .56 + .67 -}- .25) 
-f- 4 - .48]. 

There were still other complioating factors in determining the 
values of the intervals of quality between records, but there is not 
space in this article to describe fully the treatment accorded to each 
one. Tables VI and VII give the original data concerning the judg¬ 
ments used. The non-expert judges were for the most part teachers 
and normal school students, some of whom, had received musical 
training, although most of thorn were studying other branches of 
Education. Classes at Teachers College, Columbia University, sup¬ 
plied some of these judges, while others were from the normal schools 
at Worcester, Hyannia, Framingham, Fitchburg, Bridgewator, Boston, 
and Providence. All of these jud^s were graduates of standard high 
schools I'.iid at leas!. 10 vf-jirs of age, Most of them wore women. 
TIm; cxp(‘rt group wu-f compor:‘d of selected music supervisors, teachers, 
writers, and publishers attending the sessions of the Eastern Music 
Teachers Association at New York City, in May, 1920. It is unfor¬ 
tunate that the number of expert judges was not larger. 

Table YIII and Figure III show the final values of the records as 
calculated from the data given in Tables VI and VII, all values being 
stated in terras of record number 12 as a constant point of reference. 
Record number 1, for example, is .36 PE above number 12 if the judg¬ 
ments of the experts are used and .21 PE above if the uoii-expert 
judgments are employed. The value finally used is midway between 
these two, at .28 PE above number 12. Table VIII also shows the 
number of times each, record, ia played in administering each of the 
four'^ales, 
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TABiiB VI.—Rawnob op PuoNoaaAPn Ruconos by Non-bxpert Judges 
J j-equercy with which rocortla listed on horlwnlal scalo wore judgeti hotter than 
the records Hated on tho perpendicular scale at tho lo/t 


Series Ai 

303 i\id{(C8 
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3?1 judgoa 
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No. I 
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■ 
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0 


261 

197 

SS9 

12 


06 

n 

li 

1 
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102 

18 
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10 
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20S 

m 


Sariea B, 
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No. 10 
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No. 2 

No. 18 

No. 19 
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16 
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m 


10 


n 

100 
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Series G, 

318 Judges 


Serloa Hj 200 Judgea 

No, 

No. 6 

No. 4 

No. 3 

No. 2 

No. 

No. 21 

d 

Z 

No. 22 

No. 13 
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227 

■■ 

21 
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4 
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00 

88 

2 
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1 
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8 

01 

222 


46 

22 
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2 

226 
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13 

20 

43 

1 



Note; UntJor Series A tJiie table reode as fonovra! "Ilcoord number 1 vne judaecl bettor than 
record number 2 by 320 JudRea, bettor then numbers by 228 judgea, and bettor than number 4 by 
113 iudsoa. Dto." 
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Table VII.—Ratings op Phonooraph Records by Expert Jddqeb 


Series A( 

18 Judges 

■n 

Berios B 

18 judges 
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No. 5 
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No. 8 


H 
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3 
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13 
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Series C, 16 judsee II Series D, 18 iudgoa 
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No. 12 
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No. 14 
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3 

14 
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18 

is 
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10 
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Series 18 judges || Series 32 judges 
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32 
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Series I, 14 Judges 
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TabIiB VIII.— CALCUii\TED Valuba AMD Frsxjobncy OP Ube OP Eacii Rbcoud 



PE value from number 12 

Number of times played 

Record 
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Alpha 

Beta 

Gamma 

Delta 

1 

+ .30 

+ .21 

+ .28 

1 
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2 

-2.42 

-1.56 


1 

1 

1 

1 

8 
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1 

4 
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1 

1 

1 
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- .94 
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.. 

ni 

n 
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+ .85 
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+ .37 





7 

- .30 
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8 
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- .75 

- .48 
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n 

B 

1 

0 
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2 
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+2.42 
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0 
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1 

2 

1 
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1 

1 

14 
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16 

- .08 
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- .36 


2 

2 


10 

-l.Ol 

~ .18 


H 

1 

1 

1 

17 

-1.41 
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■5 

■■ 

1 

1 

18 
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+ .66 



1 

1 

19 

+ .61 

+ .05 

+ .28 

1 

1 


1 

20 

-1.92 

- .79 

-1.36 

1 


1 

1 

21 

-3.27 

-2.11 

-2.00 





22 

-7.17 

-6.00 

-0.11 






In general it will bo obaervcd that the expert judgea rate the better 
records higher and the poottsr records lower than they are rated by the 
non-oxport judgefl. "J'liero aro peculiar qualities about records num¬ 
bers 6, 7, 8j and U, however, which cause the non-export judges to 
rate them much lower than they are rated by the experts. Perhaps it 
is the failure of the non-expert judges to see beyond the obviously 
unusual combinationB of instruments and phrases. On the other hand, 
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the non-experts very greatly oven-estimate the relative quality of 
record number 18. 

In the arrangement of the final tests, record number 14 was not 
used, for both the matrix and tho m\mcal score from which it might 
be reproduced have been destroyed. Records numbers 6, 7, and 11 
were not used because of tho extreme differences between their ratings 
by the two types of judges, and numbers 21 and 22 were omitted 
because their values are so low as to cause a serious question in many 
minds oa to whether they have any musical merit whatever, 



1 ^ 0 . 8 . 

Although record number 8 is under-estimated by ordinary judges, 
it was used in the final scales, Caro was taken, however, to see that 
the lines of its associates in any group in whioh it appears do not in 
Figure 3 cross its own line. In Group B of Scale Delta, for example, 
it is associated with number 4 and number 10, both of whioh have lines 
running almost parallel to its line. Tho general rule was followed in 
the selection of all groups that no two xocorda should be played in tho 
same group if their lines in Figure 3 crossed each other, and that as 
^far'Oa possible all lines in a singlo group should bo approximately 
p^allel. 

G;roup A in each scale presents aolootions which Imvo tho largest 
aniounts of difference between their general values. Group B presents 
selections having somewhat smaller difforencoa in their values, whioh 
will natihally result in a larger number of errors when people attempt 
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to rate their relative merits. Groups D and E present records which 
have such small differences in their general values that only well 
trained listeners may be expected to distinguish correctly among them. 
The increasing difTioulty from first to last makes the test.a practical 
measure of how small a difference can be detected. No selection is 
played more tlian twice in any scale, and so far ns possible each is 
played but once. In only one case where a record is played more than, 
once does the second playing come in the next group after the first 
playing. 

iZcliahiKij/.^Unfortunatoly the tests do nob seem to be as reliable 
os one would desire, although they are probably more reliable than any 
estimates of ability the average teacher of musio would be able to make. 
In one of the writer’s classes at the University of North Carolina, 44 
sophomores wore tested first with Seale Gamma and then with Scale 
Delta. The coefficient of correlation (Pearson formula) between the 
two sets of results was .508, which is low when one considers the time 
required to administer the tests, although fairly satisfactory if one 
considers the fact that there are only 16 or 16 elements to be rated *in 
each scale, and that the tests measure emotional rather more than 
intellectual reactions. The only other check yet obtained on the 
reliability of the teats is a coeflicient of correlation of .519 between 
Alpha and Beta in a class of 21 high school students, 

Apparently there is little relationship between the ability measured 
by these scales and general academic ability. In the group of sopho¬ 
mores at the University of North Carolina the lowest score in judging 
musio was made by the boy who had next to the highest score in the 
Miller Ability Test. The coefficient of correlation between the musical 
judgment scores and the mental ability scores for the 39 students who 
took both tests was only .18, although the quadrant of the correlation 
table indicating low academic ability and high musical judgment 
contained but one individual. Perhaps one must have at least a 
sufficient amount of academic ability to make 70 points on the Miller 
Test before ho can learn to judge orchestral music successfully. It 
seems certain, however, that high general academic ability does not at 
all imply high ability to judge orchestral music. Musical appreciation 
is apparently a highly specialized trait, although it seems to 
be remarkably suscoptiblo to training. 



NOTES ON ARTICLES IN EDUCATIONAL 
PSYCHOLOGY IN CURRENT ISSUES OF 
OTHER MAGAZINES 


REPOETED BY CECILE COLLOTON 
Dcp&rtmant ot Educational Psychology, Tho Liucolti School of Toachora College 

IWTBIiUUBNCE TlBtlTa 


Whatie the IQt A. H. Martin. Tho Australasian Journal of Psycliology and 
Philosophy, 1023, Soptomber, 174-176. An explanation in simple language of the 
meaning of the term intolligonco quotient. 

Sducaiional Implications of tho IQ. John Adams. Tlio Auatrnlasian Journal 
of Psychology and Philosophy, 1023, Soplombor, 177-100. Emphasizes tho need 
for on objeotlvo standard in education and discusses tho argumonls for and against 
the use of the IQ oa suoh a standard. 

A Program Arrangommt /or Mental (Jroups, Loo C. Rjwoy. Tho School 
Kevievr, 1923, October, 008-611. How an x, y, z, claBsificnlion of high school 
students works in praotitso, Tho school has 1080 pupils and tho plan has been used 
for two years. Grouping is made on tho basis of intolligonco toaU supploinontod by 
records of actual accomplishment and tcauhers' estimates. 

The Sectioning of Jfipiv School ClosseB on the Basis of /ntclfipovcc. Gustavo A. 
Foingokl. Educational Admlnietration and Supervision, 1923, Ootobot, 300-416. 
An oxpeciment in seotiomng high school froehmon on the basis of ability proves 
advantageous to both pupils and teachers, Sevon tables givo intoroating statistical 
data. 

Serlin Schools for Gifted Children. Adolph E, Moyer. The Pedagogical 
Seminary, 1923, September, 206-210. Describes tho intolligonco tost used for tho 
selection of gifted children and gives some Information as to progress of children 
in tbe'Bpeoial school. 

YocaWonol Tests for Agricultural Engineere. H. E. Burtt and F. W. Ives. 
Journal of Applied Psychology, 1923, June, 178-187, Describes a mental lost 
designed to predict special ability in agricultural engineering. 

Some Experiments vnth Mental Alertness Tests at Northwestern Univcrsily. 
Paul L. Palmer. School and Society, 1,023, Nov. 8, 636-640. Tho results of a 16 
minute test required of oU Liberal Arts freshmen at Northwestern Uiuvorsity aro 
discussed in detail. The test score is particularly valuable in prodicting tho sucooes 
of the first and fifth quintiles. 

Psychological Tesls versus the First Semester’s Orades as a Means of Academic 
Prediction. John 1. Ernst, School and Society, 1023, Oct. 6, 410-420. Corrola* 
lions between scores on the Army Alpha and grades of tho entire college course, 
and between grades for tho first eemoster and tho entire course show tho Alpha 
Test to be a better moans of predicting a student’s ability than hie first semester 
grades. 

IKa^osis of the l/nstable Moron. George Ordahh The Journal of Delin¬ 
quency, 1923, March, 99-112. Proposes classifying responses on intolligence tests 
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ftccording to tho relovancy of rcttctioiiB to determine stability. Five individual 
coses furnish illustration. 

The Relalion between the Intelligence and Vocational Choices of High School 
Pupils. Gustave A. Fomgold. Journal of Applied Psychology, 1023, June, 143- 
153. A comparison of tho vocational choices of 512 high school freshmen tested 
with a modified Army Alpha, and Fryer’s Occupational—Intelligonce Scale shows 
that only 46 per cent make proper vocational choices; 47 per cent choose vocations 
beyond thoir mental roach; and 7 per cent underrate thoir ability. 

A Study of the Corrcfah’on of College Students' EsU,malos of Intolligence with the 
Otis I'esls and Other Scales. J. U. Yarbough. Journal of Applied Psychology, 1023, 
Juno, 167-107. Thirty coUogo students rate each other in intelligonco on a scale 
dovisod by tliomselvos. Their ratings are compared with tho instructor’s cstimateB, 
college grades, ond scores on Otis group tost. 

Intelligence Scores of Colored Pupils in High Schools, E. L. Thorndike. School 
and Society, 1923, Nov. 10, 600-670. A comparison of the scores of negro and 
whito children on two intolligonco testa shows the decided superiority of the white 
pupils. Only 4 per cent of tho negroes reach the median of tho white children. 

Intelligence and lAleralure. Ross W. Hohn and Thomas H. Briggs. School and 
Society, 1923, Oct. 27, 608-610. Compares tho reading, voluntary and required, 
of 277 high school pui)il8 and thoir intelligence as measured by tho Tcrman Group 
Test of Mental Ability. Forty seven superior and 54 inferior pupils are studied 
in detail. 

Educational Tests 

Tests in History and the Social Studies. Tho Historical Outlook, 1923, 
Novombor. Practically tiio whole Issue of tho magazine is given over to a disous* 
Bton of tests in tho following artielce: 

1. Improving tho tenolung of history through the use of tests. Bertha Elston. 

2. List of History Tests. 

8. Written Examinations and their Improvement. Prof. W. S. Monroe. 

4. Examples of History Tests. 

6. New Teats for Old. Richard H. Shryook. 

6. Now Types of History Tests. F. E. Moyer. 

7. Evaluating tho Aims and Outcomes of History. Earle Rugg. 

8. Now Kinds of Testa in Social Science. Ruth E. Hardy. 

Some Investigations Concerning ike Use of Certain Home Economics Information 
Tests. Anna M. Cooley and Grace Reeves. Dcsoribes three Home Economics 
Tests dealing with (1) studios pertaining to clothing, (2) studies pertoining to food, 
(8) other household ootivities. Findings based on 1066 tests in 40 different schools 
in 80 towns or cities of tho United States aro discussed. 

A Dictionary Test. Thomas H. Briggs. Toaohots College Record, 1923, 
September, 366-306. A diagnostic lest to determino how much instruction 
children need in tho usoof tho diotionnry. Not standardized. Based on Webster’s 
Standard School Dictionary. 

Case Studies 

The Psychological Clinic, 1923, March, April 

1. Patsy. Margaret 0. Brooke, 41-43. Hie boy who found a needed friend in 
tho olinician, 
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2. Alliion atul IIi» Parents. Bornico L<duid, 44-47. 

T1)0 poronta’ loaponBibillty for a child's nbnonoai reacliona and attitudo, 

3. Maurice. Catliorina Higgs. 62-B5. Diagnoatio UiachioR. 

4. Gfadyfl. Beatrice M. McCally, 60-68. DiaRnoalio Umching. 

fi. Albert. Ilolon W. Uroivn, 60-60. Di^ntMtio toaching. 

The Superior Child, Alico M. Jonw. The I’sychaiogicftl Clinic, 1023, March- 
April, 1-8. The firaL of ft ecrioa of enae nludioH. Uow.rilxss four chtldron with 
OKCOpUoaally higli IQ’s allowing the great range of intlividunl dlfforencoa. 

The School Ptyeholoffisl. It B. W. Hiitt Tlio Paychological Clinic, 1923, 
Maioh'-April, 48-61. Individual coses show what Uic iHtycliologisl. con do to help 
*‘failuTQ8*' in school, 

MtaCBULAKBOaO 

A Plan oi Organisation for Takiitg Caro of Bright Piipits, W. C. Franch. Tha 
Elomontory School Journal, 1023, Oetobor, 100-108. Describes special courses 
given in a Bmoll school system by the rogulnr loachcra to "enrich the curriculum" 
for the eupoiior child. 

Whai Can t/(S jSs^^ndary School Do for iho Sttuionl of I.a}\o IQf Margaret M. 
AlUuokcr. Tho School Hoviow, 1023, Novombor, 063“6(J1. Suggestions for 
modifying ourrioulum and instruction to moot tho needs of Uio child of limited 
mental oapaoity and inako Ulua an efllclcnt mombor of society. 

The Relation between Phyticol and ifonlal Development. Mnry L. Dougherty. 
The Eloraontary School Journal, 1023, October, 130-134. A study of twins—girl 
and boy—with roforonco to mental and pliyslcal dovolopineiit and personal 
cUuacterisUcfl. 

Tho Siilmorml Child. Walter B. Fumald. Bchool and Socioly, 1023, Ool. 6, 
807-406. Paper rood at Harvard Tcaohors ABaocialion. ?ro|)oscs a program for 
the salvaging cl iho subnormal ohWd In the puhhe sohools. 

Persennot Work oi Iti iSiouree. llutli Swan Clark. School and Socloty, 1023, 
Oot. 27, 487-401. Educational and vooa^onal guidance in Now York City 
schools. The use of teats in determining vocational work. 

Hou) Oft fnstruclioftol Research Department Con AssW Teochm. I*. T. Rankin. 
Journal of Educational Rcsoarch, 1023, Oetobor, 187-lOB. To mako for greater 
efioienoy in the learning process, tosling in tho school subjocU should lio carried on 
by the teacher with the help of tho resoaroh department in U»o Boleclion and tho 
intecpietatloii of the teats. 

Improvement in Rating the ItUelligoneo of Pupilt, G. F. Vamor. Journal of 
Educational Research, 1023, October, 220-232. Proposes n acalo for rating intelli¬ 
gence which takes Into account five factors usually lending to inako leacliors' ratings 
unreliable. Describes the use of thosealo in two school syatcioa. 

A Rating Scale for Individual Capaotlm, AuUwJot and Itilereslo, W, Hardin 
Hughes. Tlio Journal of Educational Method, 1923, October, 60. Describes 
a scale used In the Pasadena Junior and senior high schoola. I'iity studenUi ate 
rated at one time on 12 character trails ainl 10 special Interesta. liaoh group of 60 
ia divided by a "raan-to-man” comparison into five groups—from 2 to 6 being 
placed in the lowest and highest groups, 10-16 in the Inferior and superior grouiw 
and the remainder in the average group. 

Compdrativo SooialTraUe of Various Races, Second Study. C. B. Davenport 
and Laura 0. Crayton. Journal of Applied Psychology, 1923, Juno, 127-134. 
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One hundred nntl ciRhly-oiRhl solii of fudgmonU on 102 high BohgolBtudonta give 
tentative concIuBione on lo the dUTcreneea in 10 social traita among Gormans, Irish, 
Jt fllJans, Austrians and Kuiwiana. Three tabJ<» present the stalisticfll data. 

A Slwly of a SuiuW Oroup of /rish-Arnen'ean Children. Robocoa E. Loaming. 
The Psycliologicid Clinic, 1023, March-April, 18-40. Data from 110 onaca show 
definite rKcia) diflferericns and eharacteriitics in test results. Seventeen oases are 
described in detail. 

A Study of 1000 C/iiWran. TV/io Do Wot Conform lo School Itouline. Selino 
MoCftuilcy. Tlio Psychological Clinio, 1023, March-April, 0-17. Eraphaaizes 
the need for a spocinIiac<l curriculum for Uio backward child. 

DcKn^cn/sondA/on-dclin^i/oiMonihsDoume]/ Will-TmpcramenlTest. Edythe 
K. Biyant. The JoumnI of Dolinquonoy, 1923, January, 46-04. Reports the use 
of the Wlll-Pfoflln Tost wUJ> 420 normal boya. Comporieons arc made with 100 
tests of delinquent boys dcecribcd in n previous arliole. 

Tho Diaynoilic Volus of Individual Record Carda. Clay Campbell Rosa. 
Educational Aclniinistratioa and Supervision, 1023, Oetobor, 430-444. Aims at 
the prediction of aiicoess in high school through a study of elementary school 
records by the method of partial corrolaliona. Spelling record the best ainglo 
measure of fiCnofw fur lu'gh school according to data from 42 coses. 

An Analysis of MuUiplicalion Drill. F. B. ICnight. Jouxno! of Educational 
Rosoaroli, 1023, Oetobor, 100-207. A delailod aoalysiB of two sots of practice 
cxeroiscs In muItlpHcnlion shows iintrartant psychological differences with reference 
to the laws of lenminK. 

411/itohf/lcaiton as a Factor fn Imminy to Spell. Harry A. Greene. Journal of 
Eduoational.RcsoaTtih, 1028, October, 208-210. Summarizes three expoTimontol 
investigations of llio problem of syllabication ond reports a fourth in detail. 

Tho EjTcct of lA>calUjj on Lanyuaffo Errors. Dagny Sunne. Journal of Educa¬ 
tional Research, 1023, October, 230-261. A Study of tho written work of 8018 
oWldron in Louisiana Scliools. Roeutts compared with tho Charters report show 
that in genera! synlacUcfll errors hold rolativoly tho samo order in different locali¬ 
ties, bub that many language errors are poouliar to tho community. 

A Method for Measuring tho "Vocabulary Burden" of ^'oxlbooks, Bertha A. 
Lively and S. L, Prossoy. Educational Administration and Supervision, 1023, 
October, 380-308. 'Dio vocabulary difficulty of 16 books and one newspaper is 
evaluated by a study of thousand-word samplings with reference to the range of 
vocabulary, number of zero value words, and weighted median index number. 

The Tcet-siudy Mclhod vereue the Sludy-tcet Meltuid in Spoiling. John H. 
Kingsley. Tlio Elcineiilnry School Journal, 1023. October, 126-120. An 
anolysis of tho spelling records of Grades V to VIII over a period of two years 
shows Uio great Buporiorily of tho test-study method. 

Tho Promotion Plan ijj tho Horace Mann EUsnionlary School end Kindergarten. 
Clara Chasaoll; I'kiucalional Adminislration and Supervision, 1023, October, 
446-447. Dosoribos Dm various tests and measures used in tho school and the 
method of determining compostto scores. 

A Study of Bmolio 7 ial Stability in CAiUren. Ellon Matthews. The Journal of 
Delinquency, 1023, January, 1-40. Reports tho result of tho use of a modification 
of the Woodworth personal data shoot with an unselectcd group of 1133 children 
and 486 selected children. The effect of sox, race, age, intelligence, etc. is studied. 
Complete statistical data are given. 
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The Spoiling of liomonyme: An Bxperimmtal InBealiyalion of Teaching Thm, 

E. 0. Fiokonbmclor. Tho Podagogicftl Bomiiiary, 1923, SopUimbor, 241-251. Cou- 
cWdca from tho cxporimonlal data that tho "aoparalo moUiod” ia tnoto offactlve, 

A Two-year Experimenl toilh VoeaHonfll Quidonee in a TKarnon'e College. Iva 
Lowthor Potera, T)io Podagogtcal Somlimty, 1923, SopUsinbor, 225-240. Illus¬ 
trates what can bo done by n vocational bureau working in closo cooperation with 
tbo adnunlslrativQ and academic dopartmento. 

Latin as a Preparation for French. 'Ihoinaa J. Itirby. School and Society, 
1923, Nov. 10, 503-500. Data sccurod from 208 freshinon nl Iho State University 
of Iowa show a poaUivc though low corrolnllon between years of l^tin studied in 
high BQhool and marks made in ftrat and second semesier French in tho UnWoreily. 

N»v Tests 

Much-Popenoe Oeneral Science Ta»i. Giles M. Huoh and Herbert F. Popeaoe. 

A test of aohiavomont in general scienco for grades 7 to 0. Con bo given in a 46 
minute period. IVo altomatlve forms. Price per paokogo of 26 examination 
booklots iooluding Manual of Dirootiona, 1 Key, 1 Porcontlle Graph, one! 1 Class 
Record, |1.50 net. Specimen set .85. Published by World Book Co., Yonkers* 
on-Hudsoa, New York. 

OtU CUusificalion Tal. Arthur 8. Otis. A combined mental ability and 
educational aohievoment tost for uso in ilio regrading and classifying of ohlldrcn in 
Grades IV-VIII. Requires 60 minutes actual working time. Scoring is very 
simple. Two altomativo forms. Prioo per package of 26 booklets, 1 Koy, 1 
Interpretation Chart and PorcentUc Graph and 1 Class Roeerd, 11.80. Manual 
of uireotioQs .26. Speoimon sot .36. Published by World Book Co., Yonkers- 
on^Hudson, N. Y 

Morrison-McCali SpeUinp Scale. 3. Cnyoo Morrison and WllUatn A. McCall. 
Eight lists of 50 words eaob for testing spoiling ability of pupils in grades 2 to 8. 
Direotione for giving the test, scoring paper, and interpreting results nro coutoinod 
In the same pamphlet with the word hats, illuatratlve sentences, and tables of age 
norms and T., scores. Published by World Book Co., Yonkors-on-liudson, N. Y. 

Leu;!^ English Conpoailion Scales. Erwin Eugene Lewis, 'llio first tests 
designed for meesvumg business and soeinl correspondonce. Bepaialo scales for 
measuring order letters, letters of appHoation, narrative social letters, expository 
Boolal letters, and simple narration. Complolo directions ore printed in tho book¬ 
let. Price .26. Published by World Book Go., Youkors-on-Hudaon, N. Y, 

Van W.agenen English CompoeUion iScalss. M. J. Van Wagenon. Separate 
scales for exposition, narration, anddesoripUon. Thought content, structure, and 
mechanics arc rated separately and averaged for a general morit score. Complete 
directions are provided in tho booklet. Price .26 not. Published by tho World 
Book Co., Yonkers-on-Hudson, N. Y, 

The Lohr^Lat^haw Lalin Form Test for High Schools. Harry Franklin Latabavv. 
Published as Number 1 of tho “Studios of Education" by tbo Bureau of ISduoolional 
Reaoaroh, University of North Carolina, Chapol Hill, N. C. 

Blackelone Stenographic Profloiency Teels- E. G. Blaokstono. TypowriUng 
tesla in five alternative forma. $1.00 per paokogo of 26 including a rnanuol of 
1 Directions, a Peroentile Graph and Record Sheet. Spooimon sets, .26. Tests in 
pote taking and transcribing are now being prepared. Published by World Book 
Cbi, Yonkers-on-Hudson, New York or 2126 Prairie Avenue, Cbioago. 



NEW PUBLICATIONS IN EDUCATIONAL 
PSYCHOLOGY AND RELATED FIELDS OF 
EDUCATION --3^ 


DI5PARTMENT IN CHARGE OF LAURA ZIRBES^ 

1. Tvio Inirodudiom ia Psychology ,—In few college courses does 
the content of tho course differ so widely from institution to institu¬ 
tion as in the subject of psychology. What ia taught under that name 
in one school is anathema in another. And this is probably more 
true today than it was a generation ago, when the psychologists of this 
country were more or less loyal to the leadership of James. Although 
the present state of affairs may be deplored by some psychologists, it is 
regarded by others as indicative of the healthy and vigorous growth of 
this new and young soionce. The lusty infant will not stay put. It 
refuses to bo limited and confined to any one particular path. 

This groat diversity of opimon as to what should be the content of a 
course in Psychology is well emphasized in two introductions to psy¬ 
chology recently published. One is by Seashore* a name well known 
to all students of psychology, and the other is by Griffith,* one of the 
younger workers in the field. Each of the authors ia the preface says 
that he is trying to make the subject vital for the student. Seashore 
wants the student to '‘psychologize;” he wants psychology to function 
in the life of the student. Griffith would show the student how vitally 
psychology is concerned with the business of living in all its various 
ramifications. And so each writer set® forth upon his way, and the two 
ways seldom, if ever, meet. 

Seashore devotes about a hundred pages to sensation; Griffith about 
eight. Seashore treats in orthodox fashion of perception, attention, 
association, memory, thought; Griffith has no special treatment of these 
topics as such. Abnormol psychology, social psychology, industrial 
psychology aio all treated at length by Griffith. Only as an after¬ 
thought does Seashore make room at the end of his book for a chapter 

‘ AU unsigned roviows wore prepared by Laura Zirbes. , 

* Soashoro, 0. E.t ‘‘Introduction to Psychology.” New York Macmillan Co,, 
1923, pp. XVIII 4- 427. 

•Griffith, C. R.s ”General Introduction to Psychology." Now York, Mac¬ 
millan Co., 1923, pp. XV -H 613. 
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on dreams and a brief ohaptor on individual psychology. They do 
not seem to belong in tho logical system he has built up. 

It would seem to me that both books have their merits and their 
drawbacks as introductions to psychology. A student who follows 
Seashore may conceivably get a respect for psychology as a well- 
ordered science, somewhat dry and formal, and very much removed 
from daily life. If he continues his study, this rigid training will 
probably be of value to him. To tho great majority of students who 
only take one course in psychology, tho improssiou will be loft on them 
that psychology has a lot to say about tho abstruse facts of mental 
life but relatively little about the business of living. It woiUd seem 
dilHoult to justify the teaohing of innumerable facts about sensations 
to the ordinary student. Of what pedagogical value is accommoda¬ 
tion, convergence, retinal imago, external projection, and the like? 
And BO faot after fact is marshalled before tlio reader, with only the 
relief of the "exercisos” soattorod throughout tho book. 

Now tho average student comes to his course in psychology with a 
very vague notion of what psychology is all about. He has his own 
■notions, generally very fantastic. But ho does think that psychology 
will tell him something about hypnotism, or character-reading, or 
phrenology, or will-power; that it will teach him how to think correctly 
or develop more personality; that it will explain insanity and toll him 
whether animals reason or not. All thoso and many other things are 
treated in a delightful fashion by Griffith, I can imagine the student 
skipping "Part’* One of the text, which deals with a discussion of 
struotuialism, funotionaUsm, bohaviorism and similar "isms,^' and 
becoming completely absorbed in the rest of the book, which deals with 
genetic) social, abnormal and applied psychology. Thoro arc few 
books'that would answer so well the many quoations which tho firat-year 
student in psychology raises, aa this text by Griffith. Naturally the 
treatment of any one field is at times sketchy and the specialist in any 
one of the numerous fields will bo dissatisfied with the treatment in his 
particular field, but, considering tho purpose of tho author, on tho whole, 
the book is excellently planned and executed. 

To the student of education, neither book makes any decided 
appeal. R. P. 


2. Bducuiional Progress ,—^In ha book entitled ^‘Progressive Educa* 
tion,”^ P rofessor Miriok first pmsents an outline of the difference 

‘ Mirick, George: "Progreesive Education.” New York, Houghton Mifflin 
Co,, ,1923, pp. X + 314. 
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between the philosophical basis of considering educational topics and 
the modern factual basis. No one can justly claim that the scientific 
method of studying educational problems is os yet firmly established. 
Its merits are so rapidly becoming recognized that even though a good 
many persons aro not fully convinced regarding scientific study, they 
do appreciate that it is not quite respectable to fight against the scienti¬ 
fic study of education. The author argues that although scientific 
study tends constantly to produce change, and philosophic study 
tends to keep things as they are, the two points of view should be used 
to balance one another in considering educational topics. Scientific 
thinking ‘^tends to instability and restlessness.'’ ‘‘Without philo¬ 
sophic thinking there would have been no customs, no institutions, no 
organized society, nothing stable on which mankind could stand while 
preparation was being made for the next advance. In advocacy of a 
scientific point of view as weU as one that is philosophic, it seems that 
the author exerts himself over-much to produce a philosophic analysis 
of modern scientific study as relates to education, Though his 
discussions are often not easy to follow, his conclusions given in the 
summaries aro clear and thought-provoking. For example, “All 
education is self-education,” 

“All education has to do primarily with impulses and desires 
rather than with presentation of material.” 

“ Progressivo education is just real education. Science confirms, 
and improves the educational policies that ‘natural’ teachers have 
always followed.” 

“Much of human contact can bo understood only as it is seen from 
the biological point of view,” 

Throughout, the hook embodies current educational thought and 
literature, citing authorities regularly and abundantly. Indeed, the 
book will be found most useful os a summary of the modern educational 
philosophy wliioh has boon presented in books and magazines. Almost 
no effort is made to include a presentation of data from experimental 
work in the improvemont of practice. If specific experimentation is 
a detennining factor in educational progress, this is a significant lack. 
There aro now something like fifty schools in America which may prop¬ 
erly bo called experimental in their efforts to help education to progress. 
Books including such titles as “Pro^essive Education” and the 
“School Curriculum” would be helped if they would present the pro¬ 
grams and procedures of a seorc or more of experimental schools which 
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are trying, with acknowledged ahortcomings, to apply the scientific 
method in developing improvements in education. 

Otis W. Caldwei^l. 


3. Outlines ofCourseainPsychology .—The printed outline of a course 
to be usQcl by the student is a teaching device evidently growing in 
popularity. Professor Gifford preaente two such outlines, one for 
general psychology,* and the other for educational psychology.® 
Both of these outlines arc very well done, and mil doubtless bo of 
great help to the author's students. They may interest other teachers 
of psychology in connection with the planning and arrangement of 
their courses. The reviewer docs not believe that any teacher of 
psychology should use someone else’s outlino in actual classroom work. 

II. P. 


4. A Child Study Manual for Parents and Teachers 18 av/cW 

organized outline of 51 topics, each of which deals witli some phase of 
child life, from infancy through adolescence. Each Boction or chapter 
is introduced by a two or three page statomont which gives a pre-view 
or point of departure for reading, study and discussion. Then follows 
a brief summary in outline form an,d a rathor comprohensivo list of 
specific references. The “popular” sources arc listed first so that the 
group discussion may start with tho more general presentations and 
continue the study through tho “non-technical” readings to the 
“teohnioal" treatment of speoifio phases of tho subject. 

This outline will bo of great value to a very small proportion of all 
parents who are interested in child study. It presupposes leisure time 
lor study, training sufficient to read and understand the wording of the 
text and sources quoted, and ability to organize material thus secured 
and interpret and apply it to particular problems. But the average 
parent, comprising a much larger proportion, is not capable of using 
such an outline. It is not intended for such, but may a plea bo pro- 

‘ Gifford, W. H.: "IntroduoUon to Psychology, A Syllabus." Harrisonburg, 
Va., Garrison Piosa, 1623, pp. 36. 

* Gifford, W. H. t " Introduction to tho Lrorning Process, A Syllabus in Kduca- 
tional Psychology." Harrleonbu^, Va., Garrison Press, 1023, pp. 34. 

•Gruenborg, Bonjatoin G.; "Outlines of Child Study." Ilkiitcd by B. C. 
Qruenberg for tho Federation for Child Study,*’ Now York, Macmillan Co., 1022, 
pp. XX 4- 260. 
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ntod here for another oulUnt: of child study? The first requirement 
^houlcl be thftt the material be orgamxcd by those who, like the editor 
ouUintJ. really know the Kubjectr-maltor. Second, the material 
should bo prew,nlcd In languaire that the average parent can grasp. 
Third it filioukl rtKiuire a mininml amount of study and search for 
d through books which are oflen inacccsaiblc and uninspiring to the 
man or woman tired out by thn day’» work. Fourth, it should be 
ftdanted to the probleints of Um nvorago homo and vitalized by the 
IncluBion of numerous simple puncrotc episodcB such as are written by 
Angelo Patri in the New York Kvening Post. That parents are 
becoming more and more intorealed in child study is evidenced by the 
numerous arliclea in ncwspapc’iw and mngoaincs and the appearance 
of whole sate of books which claim to solve allthc problems of child 
training. Thei^ i« ft groui' n***^*^ study outlines written 

bv thoM who know, and in lanKHttR« which docs not make the average 
parent feel that tho disciwioti i« too llieoreUcnl, technical and remote 

from hifl actual probloma. ^ ^ 

Powons charged wdlli the rcHjwnaibiUty of organizing study group 
and loading discussions of prohlcins of child study will meanwhile be 
glad to avail IhemstflvTs of this ontliiic and will find the references 
, well Bolcctcd and camfuUy oma«iJ«**k 

Bkhtha Milleh Kugg. 

6. A Sntenl^coWj/ Determined Coutm in //amfioritwipd—Here wc 
have an orgnnisted rnitiucncti of wins and exorcises baaed on clearly 
doflned psycholosical and pedngOKica! ptindiilM winch govern learning 
and writing. In llin four coiiriwi clinpters tho experimental and 
inveBtigalionnl baaia tor n couiwt of writing lessons tor the six 8™des o 
the elemontary school is disou«wd. In view of tho.r denvatron the 

lucid statsraonta of aims and alandards dispose ‘“'7w J 

detaUed proscription of doily procedure, winch might othorw 
arbitrary and over didaetic. The content of the 
has double sanoliou. (1) H is selected to Idl t 

neods. (2) It showK a realisation of tlic instrumental tunction of hand 

writing in relation to other school auhjceln ^hich 

This handbook ia another iiiilcnlotie nlong tlio piitli °hP S 
leads past oinpirieal emninorciol ajwleina anil procedures 
tifically dotermineci educational Hlandards, inolliods an 
“^raan, Frank N. and riouRhUry, M«r>’ h.l "How lo Tcacl, IlandwritinB.'- 
I Now York, Houghton Mifilin Cte., lt)23i pi>- Y + 30«). 
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6. Socialmd English.—A dozen verbatim reports of socialized 
recitations in the varied aspects of eighth-grade English^ will prove 
not only enlivening to any teacher of English but also very practically 
suggestive. If one is already committed to this typo of recitation, 
he will, no doubt, react quickly to the fervor of an editorial introduction 
by James E. McDade, who says, “The recitation soars far above the 
old level of monotonous routine, and becomes a gamo, a romance, an 
a4venture even ” The reports themselves, uuUko many others of 
similar import, do not conceal the fact that a very capable teacher in 
the odSng somehow sees to it that the romantio flight above the old 
level starts and finishes on the ground. 

Practical Assistance for Teachersin Service ,—The praotico of stating 
the exact purpose of a book in the author’s preface considerably simpli¬ 
fies the task of a reviewer. It furnishes him with a sort of yard stick 
by which to judge the work. In the preface of the book under 
consideration,the author says, “The justification for this, another 
treatise on the elementary course of study, is the fact that compara¬ 
tively a very small number of the teachers in tho elementary schools 
are so situated os to have immediately tho results which are continu¬ 
ally being discovered in the new aspects of educational theory and 
praotiod. Tho author offers no oxouso for playing tho role of codifier 
for theso newer findings in education. For he regards this os a distinct 
sort of contribution necessary for educational progress." 

The' book, then, is a resumd, not an exhaustive treatise of any one 
aspect of the problem of curriculum building. Some general princi¬ 
ples of elementary education are disouBSod in tho first 36 pages. 
Tftch -iUbjoct. of l.hc ourrienluiii is then considered separately; the 
“ubjoci-nmltcr for each grade, stuI some discussion of method and the 
measurement of results we included. There is nothing epoch making 
about this book; if tho potential readers could come into direct contact 
with the subject-matter which it seeks to codify, rather than with the 
codification, more satisfactory results in tho change of teaching method 
might be expected. But tho average teacher in servico will find bore 
considerable inspiration and praotical help for the daily routine of 
classroom teaching. Edwin H. Eeedeii. 

Teachers College, N.Y.C. 

^ Husoh, Louise C.: “The Soolalizod Rooltallon In English.” Modem Educa¬ 
tion Series. Chicago, Tlio Plymouth Prew, 1923, pp. 88. 

’Phillips, Claude A.j ''Modem Methods and The Elomoiitary Curriculum.” 
New York, The Century Cq., 1923, pp. XIU + 380. 




